So today I thought I’d go over how to break out a comma separated string stored in sql server into individual rows.
This can be a real PITA, if you’ve ever had to do this you’ll know why. Now before I start this post I want to just say that this is a simplistic example and has one main assumption, which is the each of the values within the string you are wanting to split out are of a fixed width. The number of values can be totally variable but the values within the string must be of a static length.
If you have a string with variable with values this approach isn’t what you need, however it might give you the starting point
So lets assume we are working with a single field called “version” for example, and it contains strings such as:
What I want to do is get a line per version, now I’ve seen people use while loops, cursors and all sorts of craziness to solve this issue, when in fact it can be done in a single statement, using our old friend the numbers table!
So the first challenge is to determine how many values we are working with, I find the simplest way to do this is to count the number of delimiters and add one, something like the following:
(LEN([Version]) - LEN(REPLACE([Version], ',', ''))) + 1
What I’m doing here is taking the original length of the column we want to split then subtracting the length of the same column but with the delimiters removed, giving me the number of delimiters in the string, I simply then add one to that number to get the count of unique values I have to get, simples!
Ok so now we have the number of values, how do I go about breaking them out? well firstly, as mentioned earlier, we are going to need a numbers table. Now if you have one of these already in your database great, if not then there is actually a built in one you can use! You can see this table for yourself by running the following:
SELECT N.number FROM master..spt_values AS N WHERE N.type = 'P'
Cool, no? So we have a number table and the number of items we need to work with so lets start building the query:
SELECT S.[Version], N.number FROM dbo.Software AS S, master..spt_values AS N WHERE N.type = 'P' AND N.number < (LEN(S.[Version]) - LEN(REPLACE(S.[Version], ',', '')))
Firstly, yes I KNOW comma “joins” are the devils play thing but in this case its fine, honest!
This will give us a row per item, in the list. You’ll notice that I’ve left off the +1 on the end if the LEN statements at the bottom. This is because the number table is base 0 and this is very important to how this process works.
So now we add the following to the select clause:
SUBSTRING( REPLACE(REPLACE(S.[Version], ' ', ''), ',', ''), ( 4 * N.number ) + 1, 4) AS VersionNum
This might look a bit crazy but stick with it. So the middle line of the three is removing all spaces and commas (you need to remove anything that is not part of the values you wish to split out)
Then we use the result of that as the source for the substring, now the “4 * N.number” part is important because 4 is the length I know each of my values are, so modify this number to suit you. The “+ 1” part is due to the fact that the number table is base 0.
The final “,4” is how many characters you wish to extract from that value, as I wanted the whole value I have used 4, if for example I only wanted the part in front of the decimal place (referring back to my original example at the beginning of the post) then I’d change this value only to “,2”.
The end result looks like this:
So we’ve taken our original string “Version” and created a row per individual version number. All in a set based manner without cursors or while loops or anything else, which will always be more performant.
Hope this helps you out of string shredding hell