Shredding CSV strings in SQL Server

Hi all!

So today I thought I’d go over how to break out a comma separated string stored in sql server into individual rows.

This can be a real PITA, if you’ve ever had to do this you’ll know why. Now before I start this post I want to just say that this is a simplistic example and has one main assumption, which is the each of the values within the string you are wanting to split out are of a fixed width. The number of values can be totally variable but the values within the string must be of a static length.

If you have a string with variable with values this approach isn’t what you need, however it might give you the starting point Smile

So lets assume we are working with a single field called “version” for example, and it contains strings such as:

image

What I want to do is get a line per version, now I’ve seen people use while loops, cursors and all sorts of craziness to solve this issue, when in fact it can be done in a single statement, using our old friend the numbers table!

So the first challenge is to determine how many values we are working with, I find the simplest way to do this is to count the number of delimiters and add one, something like the following:

(LEN([Version]) - LEN(REPLACE([Version], ',', ''))) + 1

What I’m doing here is taking the original length of the column we want to split then subtracting the length of the same column but with the delimiters removed, giving me the number of delimiters in the string, I simply then add one to that number to get the count of unique values I have to get, simples! Open-mouthed smile

Ok so now we have the number of values, how do I go about breaking them out? well firstly, as mentioned earlier, we are going to need a numbers table. Now if you have one of these already in your database great, if not then there is actually a built in one you can use! You can see this table for yourself by running the following:

SELECT N.number
FROM master..spt_values AS N
WHERE N.type = 'P'

Cool, no? So we have a number table and the number of items we need to work with so lets start building the query:

SELECT S.[Version], N.number
FROM dbo.Software AS S,
master..spt_values AS N
WHERE N.type = 'P'
AND N.number < (LEN(S.[Version]) - LEN(REPLACE(S.[Version], ',', '')))

Firstly, yes I KNOW comma “joins” are the devils play thing but in this case its fine, honest! Open-mouthed smile 

This will give us a row per item, in the list. You’ll notice that I’ve left off the +1 on the end if the LEN statements at the bottom. This is because the number table is base 0 and this is very important to how this process works.

So now we add the following to the select clause:

SUBSTRING(
    REPLACE(REPLACE(S.[Version], ' ', ''), ',', ''),
( 4 * N.number ) + 1, 4) AS VersionNum

This might look a bit crazy but stick with it. So the middle line of the three is removing all spaces and commas (you need to remove anything that is not part of the values you wish to split out)

Then we use the result of that as the source for the substring, now the “4 * N.number” part is important because 4 is the length I know each of my values are, so modify this number to suit you. The “+ 1” part is due to the fact that the number table is base 0.

The final “,4” is how many characters you wish to extract from that value, as I wanted the whole value I have used 4, if for example I only wanted the part in front of the decimal place (referring back to my original example at the beginning of the post) then I’d change this value only to “,2”.

The end result looks like this:

image

So we’ve taken our original string “Version” and created a row per individual version number. All in a set based manner without cursors or while loops or anything else, which will always be more performant. 

Hope this helps you out of string shredding hell Smile

Enjoy!

Dave

Advertisements

RedGate SQL Prompt Competition Winners!

Well we’ve had some really cracking entries for this competition and I have really hard time choosing only five, you’ve all made smile in one way or another Smile

Red Gate logo

But the lucky five winners are (in no particular order):

  • Andy Cutler for his plasticine rabbit
  • Alan Dykes for “SQL Prompt gives you super powers”
  • Koen Verbeeck for little bobby tables (an oldie but a goodie!)
  • Szymon Wojicik for his Charlie and the chocolate factory motivational poster
  • Dave Green for his need for the “yell” SQL Prompt Easter egg Smile

Pop onto the comments of the main post to see everyone’s entry.

Well done all, please drop me a PM on twitter and I’ll give you all the information you need to go claim your licence! Thanks to all that took part and happy sql prompting! Open-mouthed smile

Competition – RedGate SQL Prompt!

Hi all, so following my recent post about the whole host of great tools available to help us work with SQL Server and the BI stack, the awesomesauce guys at RedGate (@RedGate | http://www.red-gate.com/) got in touch.

Red Gate logo

They enjoyed my post so much they have honoured me by giving me four SQL Prompt licences to give away to you, my readers (you lucky lucky people!) Open-mouthed smile

As its a Friday and as RedGate as such cool people as their tools will make you smile, I thought the best way to choose who gets these is simple, simply tweet a link to this blog post and then put a comment on this post, including your twitter handle, with something that will make us smile! Smile

It can be anything, a nice comment, a joke, a link, a picture, anything! The only conditions are that it’s family / work friendly (I’m looking at you Mladen! Winking smile), legal, not breaching any kind of copyright laws and isn’t massively offensive.

The best four will be chosen by myself and the guys at RedGate so please be sure to leave your twitter handle in your comment so we can get in touch with the winners

Good luck!

 

Dave