Hi guys,
I recently had an interesting sql problem passed to me and I thought I’d share the solution with you all as I couldn’t find anything nice online about it.
Background
So the issue was that we had a set of data that recorded peoples membership dates (lets say it was a gym memberships). The table looked like this (roughly
):
- Person_ID
- DateFrom
- DateTo
The requirement was to run a query that would return a year number for each row that indicated how many contiguous years the person had held a membership at that point in time. The additional complexity with this was that memberships that had a gap of less than or equal to 365 days should also be considered contiguous.
I thought about all kinds of approaches to this (and I’m sure there are some I didnt think of) but I eventually I decided that I needed to do the following:
- Assign every record a unique ID
- Assign every every record a parent id which will be the above created unique id of the record that comes before this one, where applicable.
- Use a recursive CTE to work out the year number
The Code
So lets set up the test data:
CREATE TABLE Memberships
(
MembershipID INT NOT NULL IDENTITY,
ParentID INT NULL,
PersonID INT NOT NULL,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL
)
GO
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '1999-10-02', '2000-10-01')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2000-10-02', '2001-12-28')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2002-01-10', '2003-02-10')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2003-04-08', '2003-09-15')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2003-11-26', '2004-03-15')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2004-10-05', '2004-10-06')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2005-02-02', '2005-02-09')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2005-02-24', '2005-03-01')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2006-09-28', '2007-01-07')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(10154997, '2008-11-11', '2009-01-12')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1982-08-06', '1983-08-05')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1983-08-06', '1984-08-05')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1984-08-06', '1985-08-05')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1985-08-06', '1986-08-05')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1986-08-06', '1987-08-05')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1987-08-06', '1988-08-05')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1988-08-06', '1989-08-05')
Insert into Memberships(PersonID, StartDate, EndDate)
Values(613274, '1989-08-06', '1990-08-05')
GO
So firstly we need to get the id of the any date records that fall behind each record by no less than 365 days and set the parent id with it:
So as you can see here, we join the table to itself twice, firstly to get a list of all memberships that ended no more than 365 days before this one started and secondly to ensure if more than one record meets that requirement that only the closest one is returned as the “parent”.
UPDATE M
SET M.ParentID = M2.MembershipID
FROM dbo.Memberships AS M
JOIN dbo.Memberships AS M2 ON
M.PersonID = M2.PersonID
-- Where the enddate is less than 365 days away from my start date
AND DATEDIFF(dd,M2.EndDate, M.StartDate) <= 365
-- Stops dates before the start date coming through
AND DATEDIFF(dd,M2.EndDate, M.StartDate) > 0
-- This little sub query max sure that if there are multiple
-- start dates within 365 days of my end date that only
-- the closest one is returned.
AND M2.StartDate =
(
SELECT Max(X.StartDate)
FROM dbo.Memberships AS X
WHERE X.PersonID = M.PersonID
AND DATEDIFF(dd,X.EndDate, M.StartDate) <= 365
AND DATEDIFF(dd,X.EndDate, M.StartDate) > 0
);
Once this bit is done then its simply a matter of a reasonably simple recursive CTE:
WITH DateGroups (ParentID, MembershipID, PersonID,
StartDate, EndDate, YearNum)
AS
(
SELECT
M.ParentID,
M.MembershipID,
M.PersonID ,
M.StartDate ,
M.EndDate ,
1 AS YearNum
FROM dbo.Memberships AS M
WHERE M.ParentID IS NULL
UNION ALL
SELECT
M2.ParentID,
M2.MembershipID,
M2.PersonID ,
M2.StartDate ,
M2.EndDate ,
DG.YearNum + 1
FROM dbo.Memberships AS M2
JOIN DateGroups AS DG ON
M2.ParentID = DG.MembershipID
)
SELECT DG.ParentID ,
DG.MembershipID ,
DG.PersonID ,
DG.StartDate ,
DG.EndDate ,
DG.YearNum
FROM DateGroups AS DG
ORDER BY DG.PersonID, DG.StartDate,
DG.YearNum
and this then gives us the result set:

So that’s it really, dates can be a real pain to work with but I found this to be a nice elegant solution and its nice and quick too
A quick mention and thank you to my colleague Richard Thomas (@TR5AD) for passing this original problem my way
Enjoy!
Dave
