Categories
SQL

Optimising Sort Operators in Window Functions

We’re on quite a roll with window functions these past few weeks. Last week we looked at the operators we’d see in execution plans when using a window function. This week I wanted to tackle one of the more troublesome ones specifically: the Sort operator.

We know that sort operators are expensive in our queries. To use a window function our data needs to be sorted. How about if we need multiple functions? What if we’d like the output sorted too? Can we optimise any of those out of the execution plan?

Before we get started, we’ll use the sample data below as we have been recently:

CREATE TABLE #MonthlySales (
    FinancialYear CHAR(6),
    FinancialQuarter CHAR(7),
    FinancialPeriod CHAR(7),
    SalesValue MONEY
);
  
INSERT INTO #MonthlySales
VALUES ('FY2021', '2021Q01', '2021P01', 1700.99), ('FY2021', '2021Q01', '2021P02', 2.29), ('FY2021', '2021Q01', '2021P03', 9.99), ('FY2021', '2021Q02', '2021P04', 1214.85), ('FY2021', '2021Q02', '2021P05', 2049.10), ('FY2021', '2021Q02', '2021P06', 3.99),
    ('FY2021', '2021Q03', '2021P07', 54.99), ('FY2021', '2021Q03', '2021P08', 24.49), ('FY2021', '2021Q03', '2021P09', 2181.56), ('FY2021', '2021Q04', '2021P10', 564.99), ('FY2021', '2021Q04', '2021P11', 699.09), ('FY2021', '2021Q04', '2021P12', 53.99),
    ('FY2022', '2022Q01', '2022P01', 29.99), ('FY2022', '2022Q01', '2022P02', 7.95), ('FY2022', '2022Q01', '2022P03', 21.49), ('FY2022', '2022Q02', '2022P04', 769.49), ('FY2022', '2022Q02', '2022P05', 21.98), ('FY2022', '2022Q02', '2022P06', 539.99),
    ('FY2022', '2022Q03', '2022P07', 3399.99), ('FY2022', '2022Q03', '2022P08', 32.60), ('FY2022', '2022Q03', '2022P09', 49.99), ('FY2022', '2022Q04', '2022P10', 1000.43), ('FY2022', '2022Q04', '2022P11', 742.35), ('FY2022', '2022Q04', '2022P12', 4.99);

Many many sorts

Let’s take a look at a query plan where we’re using a single window function:

SELECT *,
	QuarterAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialQuarter)
FROM #MonthlySales;
Execution plan for a window function showing a single sort operator

We’ve got our data being sorted by Financial Quarter as per the Partition clause. This is then segmented and is being spooled and aggregated. How does that look if we add another window function into the query?

SELECT *,
	QuarterAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialQuarter),
	YearlyAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialYear)
FROM #MonthlySales;
Execution plan for a query with two window functions and requires two sort operators

As you can see from the annotation, we effectively have two of the plans above merged together to produce this plan. Critically this also means that we need to apply two sort operations.

The reason for needing two operators is that our partition clauses are different for each window function. As a result the data needs to be re-sorted before it can be segmented again for the next function. The first sort is still based on the Financial Quarter, and the second one is by the Financial Year.

Now let’s take it one step further and sort our outputs too:

SELECT *,
	QuarterAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialQuarter),
	YearlyAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialYear)
FROM #MonthlySales
ORDER BY FinancialQuarter;
Execution plan for a query with two window functions and sorting which requires three sort operators

There we have it, three sort operators in the same query. The final one we’ve introduced is to sort by the Financial Quarter (again) due to our ORDER BY clause.

That could cause a headache. So what can we do to help alleviate these sort operators?

Reordering window functions

We can see that our plan is executing the window functions in the order which they are defined. Currently the plan will sort the data based on Quarter, then by Year, and finally by Quarter again. What if we change the order of our window functions?

SELECT *,
	YearlyAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialYear),
	QuarterAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialQuarter)
FROM #MonthlySales
ORDER BY FinancialQuarter;
Execution plan following reordering of window functions which removes sort for output ordering

We’ve managed to reduce the plan to only require two sorts. Our partition based on Quarter is being processed second so the records are already in the right order to return the data. We’ve optimised out the final Sort operator.

I’m not quite sure why the engine doesn’t identify and make this optimisation itself when processing the query. We may simply be able to optimise our plans by re-ordering the columns being returned.

Can we go further? Why yes, we can.

Explicit sort conditions

Due to the data which we’re using, we can understand there is a hierarchy in place in which a Period belongs to a Quarter and a Quarter belongs to a Year. The database engine isn’t aware of this however.

We know that by ordering by Quarter the data will implicitly be ordered by Year. Let’s make that explicit in both our window function and ordering clause to see what happens:

SELECT *,
	YearlyAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialYear),
	QuarterAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialYear, FinancialQuarter)
FROM #MonthlySales
ORDER BY FinancialYear, FinancialQuarter;
Execution plan showing only one sort required after combining window function sorting fields

We’ve managed to reduce the query down to use a single Sort operator. This will sort based on Financial Year followed by Financial Quarter which will satisfy all our partitions and ordering

We will still need to segment the data twice as we have different partition clauses, however we won’t need additional sorts to support those.

That’s much improved from having three sort operators, but can we go even further?

Indexing

A mighty index comes to the rescue. We’ll create an index using the same fields as the sort operator so we know our data will come sorted from source – if the plan chooses to use the index.

We’re cheating here a bit as we’ll be doing some work on our data up front to sort it. If we’re re-using this plan to read data much more than we’re writing to (and updating) the index then this trade off may very well be worthwhile.

Let’s go ahead and create a covering index which will satisfy the whole query for simplicity:

CREATE NONCLUSTERED INDEX FinancialYear_FinancialQuarter_Inc
ON #MonthlySales (FinancialYear, FinancialQuarter)
INCLUDE (FinancialPeriod, SalesValue);

With this in place let’s give it one more execution:

SELECT *,
	YearlyAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialYear),
	QuarterAverage = AVG(SalesValue)
        OVER (PARTITION BY FinancialYear, FinancialQuarter)
FROM #MonthlySales
ORDER BY FinancialYear, FinancialQuarter;
Execution plan for the query which has no sort operators with the use of an index

Voila, we’ve removed all of the Sort operators. We know that the data comes out of the index in the correct order to satisfy our partition clauses as well as our output ordering.

Wrap up

In this post we’ve looked at Sort operators in execution plans for Window Functions. Using these functions can quickly produce additional Sort operators into our plans. We’ve looked at a few options for how to optimise these which were:

  • Reordering window functions to reuse existing sorting
  • Including implicit sort conditions explicitly to allow reuse of sorted data
  • Indexing the data in a way which matches the sort conditions

With the use of all of these approaches we’ve been able to remove all sorting from our execution plan.

Have you had issues with copious sorting within complex windowed queries before? I’d like to hear if any of these suggestions would have helped or if there’s any further options which I haven’t considered.

3 replies on “Optimising Sort Operators in Window Functions”

Leave a comment