How do you type 2?!

Hi all!

I sometimes get asked from all different people how I deal with my slowly changing dimension tables and I thought it would be interesting to do a post on how I generally approach it and then see how everyone else does it too. That way we can all learn from the best bits of everyone’s ideas! Smile

Over the years the way I do this has evolved and changed from the experiences I have had (for better and for worse), and of course no two implementations are the same.

Now being an MS bod I generally use a combination of SSIS & SQL code for my ETL processing. Just before we crack on, I’d just like to make one assumption clear, in my opinion the SCD wizard transform / component of SSIS is pants!

It’s slow, memory hungry and clunky. I know some people do use it and don’t get me wrong it does work but, again in my opinion, it’s not great.

So, that assumption holding true we need to deal with SCD’s by hand so lets look at my simplistic template to how I approach the most common SCD, type 2.

Firstly what I’ll do is, for each dimension that needs this SCD behaviour,  I will set up what I call a MDCR table (Master Data Control Record) in the staging area (database) for my warehouse solution. Don’t ask me where this acronym came from because I honestly can’t remember!  Smile

So, if I have a dimension called dimProduct that requires type 2 behaviour I’d create a table something like this

CREATE TABLE MDCR_dimProduct( ProductKey INT NOT NULL, – The surrogate key for the dimension ProductNaturalKey VARCHAR(50) NOT NULL, – The Natural or Business key

RowHash INT NOT NULL, – A hash across all type 2 fields in the dimension EffectiveDate DATE NOT NULL, ExpirationDate DATE NULL, MostRecent BIT NOT NULL)

Ok, so what we have is the surrogate key for the dimension, the key for the customer from the source system (business key / BK), a hash value generated with the HASHBYTES() function with the MD5 hash and the dates the record is effective from and to.

Now every record in the dimension, in this case dimProduct, has a  corresponding record in here. In doing it this way I can avoid having any of the type 2 information actually in the dimension.

From here the process is quite straight forward, you bring in all the source data, ensuring that you add an MD5 hashto each row on the corresponding columns to those that you used in the target dimension.

This source data is then checked it against our MDCR table, joining on the business key and MostRecent =1. If the business key doesn’t exist then its a new record so I add a new column called IsNew or similar and tag the record as a new entity.

If the business key does exist then it progresses onto a second phase. In this second phase we compare the hashvalues for the source row to the hashstored in the MDCR table. If the hash values match then no change has been made and this record’s journey stops there. If they do not match then again I add a field called IsNew and the record gets tagged as an update.

Now I take both the new records and update records and combine them together before placing them in a pre created temporary processing table.

Data Flow

Once all relevant records have made it into this temporary processing table I then run a stored procedure.

This stored procedure takes every row in the temporary processing table and adds a new record to our dimension table, using an OUTPUT on the insert to ensure we get the newly added rows surrogate key values and business key.

This may seem strange but if you think about it, in a type two environment, new dimension members are new records but so are updates to existing members.

Once this is done we then take the surrogate key, business key and hash values for all of our new records and add them to the MDCR table.

The final step is to then simply set the MostRecent flag to 0 in the MDCR table for all previous versions of a dimension member (record) that has received an update

The code for the update SP looks something like this

CREATE PROCEDURE [dbo].[UpdateDimProduct]
AS
    SET NOCOUNT ON

    DECLARE @NewRecs TABLE
    (
        ProductKey INT,
        ProductNaturalKey VARCHAR(50),
        RowHASH    VARBINARY(16)
    )  

    -- add the new records and output the new product keys 
    INSERT INTO dbo.dimProduct
            ( ProductName ,
              ProductNaturalKey ,
              ProductType
            )
    OUTPUT 
        INSERTED.ProductKey
        ,INSERTED.ProductNaturalKey, NULL 
    INTO @NewRecs
    SELECT  TDP.ProductName ,
            TDP.ProductNaturalKey ,
            TDP.ProductType
    FROM dbo.temp_dimProduct AS TDP

    -- Get the record hash
    UPDATE NR
    SET NR.RowHASH = TDP.RowHASH
    FROM @NewRecs AS NR
    JOIN dbo.temp_dimProduct AS TDP ON
        NR.ProductNaturalKey = TDP.ProductNaturalKey  

    -- update the relevant maintence records
    UPDATE MDP
    SET MDP.ExpirationDate = dateadd(dd, -1, GETDATE()),
        MDP.MostRecent = 0
    FROM dbo.MDCR_dimProduct AS MDP
    JOIN dbo.temp_dimProduct AS TDP ON
        MDP.ProductNaturalKey = TDP.ProductNaturalKey
        AND MDP.MostRecent = 1
        AND TDP.IsNew = 0  

    --add the new maintenance record
    INSERT INTO dbo.MDCR_dimProduct
            ( ProductKey ,
              ProductNaturalKey ,
              EffectiveDate ,
              ExpirationDate ,
              MostRecent ,
              RowHASH
            )
    SELECT  NR.ProductKey ,
            NR.ProductNaturalKey ,
            GETDATE() AS EffectiveDate,
            NULL AS ExpirationDate,
            1 AS MostRecent,
            NR.RowHASH 
    FROM @NewRecs AS NR

    SET NOCOUNT OFF

And that’s about it.

As I said this is a very broad description of how I generally approach type 2 dimensions. So, hope this has been helpful and I love to hear how you type 2 as well!

Enjoy

Dave

SQL Saturday 194 in Exeter – Part 2

HI all, so as promised here is my follow up post with the slides and demos from both the pre con and also my session on the Saturday. The files can be found here and are password with the password Dave and I gave out on the day (my Saturday session files are not passworded)

https://dl.dropbox.com/u/81661862/ProceduralVsSetBased.zip

https://dl.dropbox.com/u/81661862/SQL%20Sat%20Exeter.zip

https://dl.dropbox.com/u/81661862/SQL%20Server%202012.zip

I also just wanted to take the time to share something really nice that we received from the event, the session feed back. I can’t emphasise enough how much we as speakers appreciate it when people take the time to give feed back, especially comments on top of the scores.

Its also really nice when all the scores are high numbers Smile If you have ever been lucky enough or are ever lucky enough to get to present a conference session or a pre con, you’ll know that a lot of work goes into it, and getting feedback makes it all worth while. Good feedback is awesome but also any constructive criticism is always welcomed too. I don’t personally mind if you give me a low score as long as you tell me why, no one is perfect and this kind of feed back is the best way to improve.

That being said, the two feedbacks I got from SQL Saturday in Exter where all amazing, it’s genuinely humbling and means a lot when people give nice feed back and say that it has helped them

So, firstly I want to share the scores from Dave Ballantyne and my pre con session. By the way all scores are out of 5

image

As you can see we got pretty much full marks in all sections, again this is great and we really appreciate it. The comments as well really made us go “wow”. Again, amazing.

Then the feedback for my session on Saturday, again scores are out of 5

image

Again I was really amazed at the scores and also the comments. Overall I’m totally overwhelmed by how positive everyone was and all the nice things that people said, in feedbacks, on the day and also via twitter.

image

image

image

Again a selection of a couple of the really nice things that people said post the event.

Now you might be asking why “Dave, why are you posting all this ego stroking material” well for a few reasons, to say thank you to everyone that took the time to leave the great feedback. Also to say that it really helps everyone leaving feedback, even the organisers, so they can gauge how the event went down.

And finally to share it so that anyone thinking about speaking for the first time can see what to expect and also that generally everyone is really nice and that all feedback is good, even constructive criticism.

Anyway that’s it for now

Thanks

Dave!

SQL Saturday 194 in Exeter – Looking Back

So this weekend finally saw SQL Saturday Exeter arrive, for me its felt like ages in the build up so it was great when the time finally came!
I traveled up on the Thursday night ready for the pre con Dave Ballantyne and I were doing on the Friday.

We started the Friday morning the breakfast of champions way!

image

As you can see, it was a good way to start the day!

We then went on to our room and waited for our awesome delegates to arrive. We actually “won” the day in terms of numbers having about 16 attendees, something we didn’t hesitate to rib the other pre con trainers about ;)

Having never done a pre con before I wasn’t sure what to expect but the whole day was great, everyone that attended was interested and engaged and asking loads of questions, which is fantastic.

It all went mostly without a hitch other than the “demo gods” messing with us once it twice. The best of these was myself “loosing” a demo I wanted to run spending a couple of minutes trying to find it to no avail and then stumbling across it about 40 minutes later, to the sound of much laughter from the attendees I might add! ;)

image

We broke the day up into four chunks with Dave and I alternating between sections and chipping to each others segments too, which seemed to work really well.

image

A great day of training followed by the speaker meal and a few beers. Miss ninja also came down and joined us for the evening with a view to coming along on the Saturday.

Saturday came and the main event was upon us, was great to see all the speakers from the sql family there and loads of engaged attendees too.

I did my session at ten fourty five and had a full room of engaged people, again asking lots of interesting questions. I was even graced by Jamie Thompson which was good who asked a few questions and kept me on my toes! :)

We did unfortunately slip off a little early as myself and miss ninja started to feel a bit under the weather but all in all an awesome weekend was had!

I’m on a train as I write this so I will update the post later today adding a password zip file containing all slides and demos from both the pre con and Saturday session

As a final thing, I want to say a massive thank you to all the organisers and helpers, you guys made this an awesome event and I hope everyone recognises your hard work

Anyway that’s it for now

Dave

SQL Sat 194 – Exeter Pre Con

sqlsat194_whitebackground_web

Hi all!

As you may or may not know, Dave Ballantyne and myself are lucky enough to be doing a pre con session at the up coming SQL Sat 194 in Exeter on the 8th of March 2013.

This is my first ever pre con session and I don’t mind telling you that, from my side at least, it has been a lot of work to put together. I’m not sure what I expected when Dave and I agreed to do it but it has certainly taken up a few evenings and weekends Smile

That being said, it’s been and labour of love and I’m genuinely really excited about spending a day with a group of people that have spent real cash monies to come and see Dave and myself. It’s genuinely a humbling experience when you think about it like that, so to all that are attending, thank you! Open-mouthed smile (There may also be some Haribo as a thank you as well on the day)

So, as a teaser for the people that are attending I thought I’d give you a sneak peak of what you’ve got to look forward to during the day:

  • Query Plan Deep Dives
    • Taking a deep look into statistics, plan caching and query plan internals and more
  • Procedural vs set based code
    • Looking at the performance implications of cursors, UDFs, TVFs, Recursive CTE’s and more
  • TSQL Myths and Misconceptions
    • Taking a look at common mistakes people make in sql server and TSQL coding and also challenging common misconceptions and misunderstandings
  • SQL Server 2012 TSQL Improvements
    • A good look at some of the new TSQL and other functionality of sql 2012

So there it is, I hope that will whet your appetite for getting stuck in on the day. If there’s anything any of you attendees want to take a specific look at then please either drop a comment on this post or drop me an email (davidmorrison2009 at googlemail dot com) and we will see what we can do

Look forward to seeing you in Exeter!

Dave

SQL Saturday 194 – I’m Speaking! (Yay!)

Hi all,

ImSpeaking_SQLSatExeter_web

Bit of a belated post but I wanted to just share the news! Smile  As the post title implies I have had a session selected to present at the upcoming PASS SQL Saturday Exeter event.

Not only that but myself and Dave Ballantyne were also lucky enough to be offered the opportunity to do a whole days pre-con session as well! (http://sqlsouthwest.co.uk/SQLSaturday_precon.htm#pc1)

This will be my first ever pre-con session so if you’re coming, be gentle with me! Winking smile There may also be some haribo and other goodies in it for you when you come along!

I’m always genuinely excited and also very humbled whenever I get a session chosen to present, there are a whole host of very smart people out there who have a lot of really interesting and clever stuff to say, so its always genuinely an honour. So thank you to everyone that voted for my session Open-mouthed smile

If you are coming along to SQL Sat Exeter then please find me and see me and say hi! It’s always good to put faces to (twitter) names. I’ll also have the lovely miss Ninja is tow as well as I have convinced here that being a SQL geek is where it’s at! Winking smile

Anyway, hope to see you in Exeter!

Dave

 sqlsat194_whitebackground_web

What’s your leash?

Hi all, not a technical post today but more food for thought. I’m going to relate it mainly to career progression but to be honest it applies to all aspects of life (Get me being deep! Smile)

Anyway . . .

So, the other day I was walking back from my local super market, I came to a long stretch of path, quite wide with greenery on either side. Coming up from the other direction was a guy with quite a large dog. The guy looked respectable enough and to be honest big dogs don’t really bother me that much so I thought nothing of it.

As we got a little closer to passing each other I noticed this was a Rottweiler, it was a big strong and proud looking animal and looked quite happy to be plodding along by its owners side. Again I thought nothing of it as I spent a fair few hours around rescue Rottweiler’s as a kid and know first hand that the majority of them and big soppy softies! Smile

Just as we went to pass each other the dog decided it wanted to come and say “hi” to me (There was no malice in its action, I’m pretty sure it was just being friendly), the owner noticed this and simply gave the dogs leash (lead) a gentle tug and said something along the lines of “come on” to the dog and we all went on our way.

As I was walking back to my place I started to think about the dog and its owner and that little scenario I just described. As I thought about it some more I realised was that the reality of the situation was if that dog really wanted to make its way to me, be it to just say hi or to cause me harm, there was very little myself or the owner could have done about it, due to the large and muscular nature of the animal. Yet all the owner had to so was give the littlest of tugs and a gentle word and this strong and proud animal continued on its way.

Again this got me thinking as to how this situation could come about, if this was a “wild” animal on a lead it wouldn’t of happened, so I mentioned it to a friend of mine who trains dogs for a living. He said its all to do with conditioning from when they are pups, basically when they are small, that gentle tug on the lead and the gentle word are much more forceful. He went on to explain that effectively this idea of you being bigger and stronger than them just kind of sticks in there head and this is why the behaviour I saw was possible.

All interesting stuff, you might be thinking, but what has this got to do with me?

Mental Leash

Although we as human being obviously don’t get lead around on leash on a day to day basis, we are however held back by our own mental leashes.

As part of my job, I get to travel around a lot and go into a whole host of companies of differing size and ethos and the one common thing I can say every where I go is that I see is people being lead around by their own mental leashes, being held back by their own insecurities, fears and in some cases even laziness and lack of motivation. People stuck in a rut or stubborn in their ways. All these things have been built around experiences from the past, both positive and negative, just like a puppy is trained that its pointless to pull on its leash or that no means don’t do that.

People that allow themselves to be lead totally around by their leashes, generally speaking, are the ones who’s qualifications are from years ago, that haven’t been on a course or attended a user group / seminar or brought a book on their particular area of study for years. They end up stuck in the same role for the same company for umpteen years and get institutionalised.

All of these things can be a metaphorical leash around your neck being given a gentle tug, or the metaphorical firm word directing you away from what you want.

Now at this stage, so I don’t start to sound like a holier than thou tool, I’d like to point out that I’m far from perfect, I have a lot of my own “mental leashes” and over the last year or so I’ve been slowly trying to remove them, some more successfully than others. I still have quite a few but I try not to let them pull me around quite so much

So, what’s your leash?

My challenge to you all reading this now is to think about your leashes, what are the things that might be holding you back. Is it that little voice inside that says “cant” or maybe that fear you get when something new comes along?

Whatever they are, identifying them is the first step, and then pulling against them / removing them so you can approach that metaphorical hansom stranger and say “hi” without getting that tug around your neck

Final thoughts

Anyway this was just something that’s been on my mind and I wanted to share it, I hope you’ve found it interesting. Please feel free to drop me any comments as I welcome all feedback

Enjoy!

Dave

Running year number across date ranges

Hi guys,

I recently had an interesting sql problem passed to me and I thought I’d share the solution with you all as I couldn’t find anything nice online about it.

Background

So the issue was that we had a set of data that recorded peoples membership dates (lets say it was a gym memberships). The table looked like this (roughly Smile):

  • Person_ID
  • DateFrom
  • DateTo

The requirement was to run a query that would return a year number for each row that indicated how many contiguous years the person had held a membership at that point in time. The additional complexity with this was that memberships that had a gap of less than or equal to 365 days should also be considered contiguous.

I thought about all kinds of approaches to this (and I’m sure there are some I didnt think of) but I eventually I decided that I needed to do the following:

  • Assign every record a unique ID
  • Assign every every record a parent id which will be the above created unique id of the record that comes before this one, where applicable.
  • Use a recursive CTE to work out the year number

The Code

So lets set up the test data:

CREATE TABLE Memberships
(
    MembershipID INT NOT NULL IDENTITY,
    ParentID INT NULL,
    PersonID INT NOT NULL,
    StartDate DATE NOT NULL,
    EndDate DATE NOT NULL
)

GO

Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '1999-10-02', '2000-10-01')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2000-10-02', '2001-12-28')
Insert into Memberships(PersonID,  StartDate, EndDate)
Values(10154997, '2002-01-10', '2003-02-10')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2003-04-08', '2003-09-15')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2003-11-26', '2004-03-15')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2004-10-05', '2004-10-06')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2005-02-02', '2005-02-09')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2005-02-24', '2005-03-01')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2006-09-28', '2007-01-07')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2008-11-11', '2009-01-12')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1982-08-06', '1983-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1983-08-06', '1984-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1984-08-06', '1985-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1985-08-06', '1986-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1986-08-06', '1987-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1987-08-06', '1988-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1988-08-06', '1989-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1989-08-06', '1990-08-05')

GO

So firstly we need to get the id of the any date records that fall behind each record by no less than 365 days and set the parent id with it:

So as you can see here, we join the table to itself twice, firstly to get a list of all memberships that ended no more than 365 days before this one started and secondly to ensure if more than one record meets that requirement that only the closest one is returned as the “parent”.

UPDATE M
SET M.ParentID = M2.MembershipID 
FROM dbo.Memberships AS M
JOIN dbo.Memberships AS M2 ON 
        M.PersonID = M2.PersonID

        -- Where the enddate is less than 365 days away from my start date
        AND DATEDIFF(dd,M2.EndDate, M.StartDate) <= 365

        -- Stops dates before the start date coming through
        AND DATEDIFF(dd,M2.EndDate, M.StartDate) > 0 

        -- This little sub query max sure that if there are multiple 
        -- start dates within 365 days of my end date that only 
        -- the closest one is returned. 
        AND M2.StartDate =
        (
            SELECT Max(X.StartDate) 
            FROM dbo.Memberships AS X 
            WHERE X.PersonID = M.PersonID
            AND DATEDIFF(dd,X.EndDate, M.StartDate) <= 365
            AND DATEDIFF(dd,X.EndDate, M.StartDate) > 0
         );

Once this bit is done then its simply a matter of a reasonably simple recursive CTE:

WITH DateGroups (ParentID, MembershipID, PersonID, 
    StartDate, EndDate, YearNum)
AS
(
    SELECT  
        M.ParentID,
        M.MembershipID,
        M.PersonID ,
        M.StartDate ,
        M.EndDate ,
        1 AS YearNum
    FROM dbo.Memberships AS M 
    WHERE M.ParentID IS NULL

    UNION ALL 

    SELECT  
        M2.ParentID,
        M2.MembershipID,
        M2.PersonID ,
        M2.StartDate ,
        M2.EndDate ,
        DG.YearNum + 1
    FROM dbo.Memberships AS M2 
        JOIN DateGroups AS DG ON 
        M2.ParentID = DG.MembershipID
)

SELECT  DG.ParentID ,
        DG.MembershipID ,
        DG.PersonID ,
        DG.StartDate ,
        DG.EndDate ,
        DG.YearNum
FROM DateGroups AS DG
ORDER BY DG.PersonID, DG.StartDate, 
    DG.YearNum

and this then gives us the result set:

image

So that’s it really, dates can be a real pain to work with but I found this to be a nice elegant solution and its nice and quick too

A quick mention and thank you to my colleague Richard Thomas (@TR5AD) for passing this original problem my way

Enjoy!

Dave

Smile