Running year number across date ranges

Hi guys,

I recently had an interesting sql problem passed to me and I thought I’d share the solution with you all as I couldn’t find anything nice online about it.

Background

So the issue was that we had a set of data that recorded peoples membership dates (lets say it was a gym memberships). The table looked like this (roughly Smile):

  • Person_ID
  • DateFrom
  • DateTo

The requirement was to run a query that would return a year number for each row that indicated how many contiguous years the person had held a membership at that point in time. The additional complexity with this was that memberships that had a gap of less than or equal to 365 days should also be considered contiguous.

I thought about all kinds of approaches to this (and I’m sure there are some I didnt think of) but I eventually I decided that I needed to do the following:

  • Assign every record a unique ID
  • Assign every every record a parent id which will be the above created unique id of the record that comes before this one, where applicable.
  • Use a recursive CTE to work out the year number

The Code

So lets set up the test data:

CREATE TABLE Memberships
(
    MembershipID INT NOT NULL IDENTITY,
    ParentID INT NULL,
    PersonID INT NOT NULL,
    StartDate DATE NOT NULL,
    EndDate DATE NOT NULL
)

GO

Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '1999-10-02', '2000-10-01')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2000-10-02', '2001-12-28')
Insert into Memberships(PersonID,  StartDate, EndDate)
Values(10154997, '2002-01-10', '2003-02-10')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2003-04-08', '2003-09-15')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2003-11-26', '2004-03-15')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2004-10-05', '2004-10-06')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2005-02-02', '2005-02-09')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2005-02-24', '2005-03-01')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2006-09-28', '2007-01-07')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(10154997, '2008-11-11', '2009-01-12')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1982-08-06', '1983-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1983-08-06', '1984-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1984-08-06', '1985-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1985-08-06', '1986-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1986-08-06', '1987-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1987-08-06', '1988-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1988-08-06', '1989-08-05')
Insert into Memberships(PersonID,  StartDate, EndDate) 
Values(613274, '1989-08-06', '1990-08-05')

GO

So firstly we need to get the id of the any date records that fall behind each record by no less than 365 days and set the parent id with it:

So as you can see here, we join the table to itself twice, firstly to get a list of all memberships that ended no more than 365 days before this one started and secondly to ensure if more than one record meets that requirement that only the closest one is returned as the “parent”.

UPDATE M
SET M.ParentID = M2.MembershipID 
FROM dbo.Memberships AS M
JOIN dbo.Memberships AS M2 ON 
        M.PersonID = M2.PersonID

        -- Where the enddate is less than 365 days away from my start date
        AND DATEDIFF(dd,M2.EndDate, M.StartDate) <= 365

        -- Stops dates before the start date coming through
        AND DATEDIFF(dd,M2.EndDate, M.StartDate) > 0 

        -- This little sub query max sure that if there are multiple 
        -- start dates within 365 days of my end date that only 
        -- the closest one is returned. 
        AND M2.StartDate =
        (
            SELECT Max(X.StartDate) 
            FROM dbo.Memberships AS X 
            WHERE X.PersonID = M.PersonID
            AND DATEDIFF(dd,X.EndDate, M.StartDate) <= 365
            AND DATEDIFF(dd,X.EndDate, M.StartDate) > 0
         );

Once this bit is done then its simply a matter of a reasonably simple recursive CTE:

WITH DateGroups (ParentID, MembershipID, PersonID, 
    StartDate, EndDate, YearNum)
AS
(
    SELECT  
        M.ParentID,
        M.MembershipID,
        M.PersonID ,
        M.StartDate ,
        M.EndDate ,
        1 AS YearNum
    FROM dbo.Memberships AS M 
    WHERE M.ParentID IS NULL

    UNION ALL 

    SELECT  
        M2.ParentID,
        M2.MembershipID,
        M2.PersonID ,
        M2.StartDate ,
        M2.EndDate ,
        DG.YearNum + 1
    FROM dbo.Memberships AS M2 
        JOIN DateGroups AS DG ON 
        M2.ParentID = DG.MembershipID
)

SELECT  DG.ParentID ,
        DG.MembershipID ,
        DG.PersonID ,
        DG.StartDate ,
        DG.EndDate ,
        DG.YearNum
FROM DateGroups AS DG
ORDER BY DG.PersonID, DG.StartDate, 
    DG.YearNum

and this then gives us the result set:

image

So that’s it really, dates can be a real pain to work with but I found this to be a nice elegant solution and its nice and quick too

A quick mention and thank you to my colleague Richard Thomas (@TR5AD) for passing this original problem my way

Enjoy!

Dave

Smile

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s