Dynamic Date Ranges with Window Functions

After recently covering window functions I thought it would be a good opportunity to share one of my favourite use cases for them. The functions for ranking and aggregations have their own clear use cases but here I’d like to discuss a particular use for the LAG and LEAD functions.

The scenario

There are instances where you might have a set of data which should be associated with date ranges, such as price changes over time. These can be stored in a slowly changing dimension where we have an effective date associated with the attributes at the point of the change.

What we can find with this situation is that we try to join this to some other data and it can be tough with just a singular effective date. To support this we want to unwrap the data to provide a start and end date for the prices to make the join possible.

Let’s start out with some sample data for a couple of products we’ll work with:

CREATE TABLE #PriceHistory (
	ProductID INT,
	EffectiveFromDate DATE,
	Price DECIMAL(9,4)
);

INSERT INTO #PriceHistory
VALUES (1, '2021-04-12', 1.99),
	(1, '2021-11-28', 2.10),
	(1, '2022-01-01', 2.00),
	(2, '2020-01-01', 12.50),
	(2, '2021-07-11', 4.20),
	(2, '2022-02-20', 6.90);

When is it effective to?

In our sample data we’re starting with an effective from date so the first thing we need to look at is when the records should be effective to. Clearly we want that to be right up until the next price change is effective. Its worth noting here that we’re using a DATE data type so I’m looking for the effective to date being the day prior to the next price change so we can use the dates in range searches using the BETWEEN operator.

We can achieve this using the LEAD function to identify where the next record is effective from. We’ll be using the PARTITION based on the product and then ORDER the records based on the EffectiveFromDate. It’ll look something like this:

SELECT
	*,
	LEAD(EffectiveFromDate, 1)
		OVER (PARTITION BY ProductID
		ORDER BY EffectiveFromDate) [EffectiveToDate]
FROM
	#PriceHistory;

Now that we’ve got that we can improve it a little further by implementing a default to the function so we can set an end date to the final entry. On top of that we can then subtract a day from the result to get us to the true effective date. It would now look something like this:

SELECT
	*,
	DATEADD(DAY, -1,
		LEAD(EffectiveFromDate, 1, '2100-01-01')
		OVER (PARTITION BY ProductID
		ORDER BY EffectiveFromDate)) [EffectiveToDate]
FROM
	#PriceHistory;

Extending the range

When implementing the ranges like this I’ve seen instances where we have data for a period prior to any of the effective dates. As a result I like to extend the starting point for the initial price to be back at some historical point in time prior to any of the records we may need to join to. This is purely optional but thought it was worth including for completeness.

To do this we can use the LAG window function to identify the first record in the sequence (the one with the NULL value) so we can set a default start date overriding what’s already present in the data. We’ll use the same approach for PARTITION and ORDER BY as the LEAD function but this time we’ll be looking in the past:

SELECT
	*,
	CASE WHEN LAG(EffectiveFromDate)
		OVER (PARTITION BY ProductID
		ORDER BY EffectiveFromDate) IS NULL
		THEN '1900-01-01'
		ELSE EffectiveFromDate END [NewEffectiveFromDate]
FROM
	#PriceHistory;

You’ll see here we’ve implemented the default value slightly differently as we don’t want to use the result of the window function for the value, we just want it as an indicator of when a default needs to be set, otherwise we’ll use the actual value for that record.

As with most things there are other alternatives to this such as using the ROW_NUMBER window function to pick out record number 1 in the sequence.

All together now

Having covered both of those its time to put it all together into a single result set. I’ve taken the opportunity to move the hard-coded dates out into variables to make the script more maintainable in the future. Here is what it all looks like:

DECLARE @DefaultStartDate DATE = '1900-01-01';
DECLARE @DefaultEndDate DATE = '2099-12-31';
	
SELECT
	*,
	CASE WHEN ROW_NUMBER()
		OVER (PARTITION BY ProductID
		ORDER BY EffectiveFromDate) = 1
		THEN @DefaultStartDate
		ELSE EffectiveFromDate END [EffectiveFrom],
	DATEADD(DAY, -1,
		LEAD(EffectiveFromDate, 1,
			DATEADD(DAY, 1, @DefaultEndDate))
		OVER (PARTITION BY ProductID
		ORDER BY EffectiveFromDate)) [EffectiveTo]
FROM
	#PriceHistory;

Wrap up

Hopefully you’ve found it beneficial to see how we can dynamically create the date ranges for our slowly changing dimension based on a single effective date and a couple of window functions.

Clearly having gone through this process you might wonder if its a better option to model the dimension with an effective from and to date? It’s certainly an alternative but I thought it was worth sharing this since I’ve seen some implementations of this which have used bad logic in there and you can end up with duplicate entries or overlapping ranges if things aren’t quite right. This can compound issues downstream where you might end up trying to insert duplicate records into a fact table or a report may produce duplicate data when joined.

Opting for a single effective date we can then apply a unique constraint using the effective date – in this example the product and effective date – and with the range built dynamically we can be confident in the output being consistent and not overlapping.

The scenario

When is it effective to?

Extending the range

All together now

Wrap up

Share this:

Related

Leave a comment Cancel reply