Categories
SQL

Eliminating Bad Code Smells with Indexing Computed Columns

Bad code smells can run through a system, and one instance can quickly spread when code is recycled. Simon recently looked at a non-sargable example and was able to fix it by using an index and refactoring the query.

I wanted to consider an alternative approach if we saw the same issue repeated consistently. We don’t want to fix each instance, we want a single fix. We’ll solve this with indexed computed column.

We can index computed columns to help resolve deterministic (i.e. won’t change, no GETDATE() allowed) clauses. Let’s get started.

Here’s the query from the StackOverflow database:

SELECT COUNT(*)
FROM dbo.Posts p
WHERE YEAR(CreationDate) = 2009;
Execution plan for the query which is scanning the clustered index

As a baseline this needs on average 2300ms of CPU time and performs 801k reads as we can only scan the whole table.

We’ll start by adding a computed column to resolve the YEAR() function. This won’t be persisted so won’t impact query performance by itself:

ALTER TABLE dbo.Posts
ADD CreationYear AS YEAR(CreationDate);

Now for the magic by indexing this computed column:

CREATE INDEX IX_CreationYear
ON dbo.Posts (CreationYear);

This now persists the result of the function from the computed column, but outside of out clustered index. This index also produces a nice tight histogram compared to the entire CreationDate column:

DBCC SHOW_STATISTICS (
	'dbo.Posts',
	'IX_CreationYear'
) WITH HISTOGRAM;
Histogram details for the newly created index on the computed column

We can now re-run the same query as before without any code changes needed:

SELECT COUNT(*)
FROM dbo.Posts p
WHERE YEAR(CreationDate) = 2009;
Execution plan for the query which is using the new indexed computed column

The execution plan looks similar, but we’ve traded our clustered scan for a non-clustered seek thanks to the index.

More importantly, the performance. Execution is blistering, averaging 160ms with just over 2k reads. An improvement of 93% and 99.7% respectively.

All without needing to change the underlying code.

Wrap up

Bad code smells like non-sargable predicates can propagate through systems when code is reused. Here we’ve taken one example and looked how to to solve it through indexing a computed column.

This not only performs much faster than the original query, but critically without needing to change the code. This means that other re-used code can reap the benefits without any other changes.

This can also be a great approach if you don’t have access to the source of queries, for example an ORM which is building the statements itself.

2 replies on “Eliminating Bad Code Smells with Indexing Computed Columns”

Leave a reply to The Power of Computed Columns – Curated SQL Cancel reply