Categories
SQL Server

Impact on Statistics of Rolling Back Transactions

I recently saw an Office Hours (at sea) Q&A from Brent where he was asked if statistics were rolled back along with a transaction. It’s a great question which I’d never come across so thought it would be worth looking into.

TL;DR: They aren’t, but they may need to be updated when the data is next queried.

For a more comprehensive answer, let’s get a table and throw a selection of data in there:

/* Create a table and index */
CREATE TABLE dbo.StatsTable (
	ID INT IDENTITY(1, 1) PRIMARY KEY,
	SomeNumber INT NOT NULL
);

CREATE NONCLUSTERED INDEX IX_SomeNumber
ON dbo.StatsTable (SomeNumber);

/* Add some random sample data (1-10) */
WITH Numbers AS (
	SELECT Number = CAST(RAND() * 10 AS INT) + 1, Ctr = 1
	UNION ALL
	SELECT CAST(RAND(CHECKSUM(NEWID())) * 10 AS INT) + 1, Ctr + 1
	FROM Numbers
	WHERE Ctr < 1000	/* Counter to stop at 1000 */
)
INSERT INTO dbo.StatsTable (SomeNumber)
SELECT Number
FROM Numbers
OPTION (MAXRECURSION 1000);

/* Generate some stats */
SELECT SomeNumber, COUNT(1)
FROM StatsTable
GROUP BY SomeNumber
ORDER BY SomeNumber;

Now we’ve queried the data we’ve got statistics generated on our index which shows 1000 records evenly-ish distributed into 10 buckets:

Initial statistics generated from the data

Let’s make some changes

To get our statistics to update we’ll add another 1000 records into our data covering the range 11-20. This will add a few more buckets into our statistics so we’ll be able to differentiate between the previous and new figures.

We’ll start a transaction, add the new data and then query the table which should force the statistics to update:

/* Start a transaction */
BEGIN TRAN;

/* Add another 1000 records (11-20) */
WITH Numbers AS (
	SELECT Number = CAST(RAND() * 10 AS INT) + 11, Ctr = 1
	UNION ALL
	SELECT CAST(RAND(CHECKSUM(NEWID())) * 10 AS INT) + 11, Ctr + 1
	FROM Numbers
	WHERE Ctr < 1000	/* Counter to stop at 1000 */
)
INSERT INTO dbo.StatsTable (SomeNumber)
SELECT Number
FROM Numbers
OPTION (MAXRECURSION 1000);

/* Query the data to update statistics */
SELECT SomeNumber, COUNT(1)
FROM StatsTable
GROUP BY SomeNumber
ORDER BY SomeNumber;

With that done we’ll go ahead and check the statistics:

Statistics showing additional data after being updated in a transaction

We can see those new figures added into the statistics so let’s roll back the transaction and recheck the figures again:

/* Rollback the transaction */
ROLLBACK;

/* Check the statistics after rollback */
DBCC SHOW_STATISTICS ('dbo.StatsTable', 'IX_SomeNumber');
Statistics showing the same figures as prior to the rollback

Huh, we’ve rolled back our data but got the same statistics?

The thing with statistics is that they’re only going to get updated when you want to query the data and if sufficient changes have been made. We can see more about this in the system DMV sys.dm_db_stats_properties:

SELECT s.[name], p.rows, p.steps, p.modification_counter
FROM sys.stats s
	CROSS APPLY sys.dm_db_stats_properties(s.[object_id], s.stats_id) p
WHERE s.[object_id] = OBJECT_ID('dbo.StatsTable')
	AND s.[name] = 'IX_SomeNumber';
Modification counter indicates number of rolled back records have changed

As we can see here our statistics were based on 2000 records being present however there have been 1000 changes since they were generated – the 1000 records we’ve just rolled back. So if we query the data again now the statistics should get updated:

/* Query the data again */
SELECT SomeNumber, COUNT(1)
FROM StatsTable
GROUP BY SomeNumber
ORDER BY SomeNumber;

/* Check if statistics have been updated */
DBCC SHOW_STATISTICS ('dbo.StatsTable', 'IX_SomeNumber');
Statistics have been rolled back after the data has been re-queried following rollback

There we have it, the statistics have been updated again and we’re back to something which resembles what we started with before our transaction was started.

Testing a small change

Now we’re just going to do the same type of thing again but this time we’ll only change a single record in our data. As this won’t be enough to trigger a statistics update automatically we’ll also manually do that:

/* Start a transaction */
BEGIN TRAN

/* Add a new record */
INSERT INTO StatsTable (SomeNumber) VALUES (11);

/* Force statistics to update */
UPDATE STATISTICS dbo.StatsTable (IX_SomeNumber);

/* Check the new statistics */
DBCC SHOW_STATISTICS ('dbo.StatsTable', 'IX_SomeNumber');
Statistics after forced rebuild showing single new record

We’ve got our 11th key present in our statistics so we’ll again rollback the data change followed by querying the data and checking if the statistics have updated again:

/* Rollback the transaction */
ROLLBACK;

/* Query the data again */
SELECT SomeNumber, COUNT(1)
FROM StatsTable
GROUP BY SomeNumber
ORDER BY SomeNumber;

/* Check the statistics after rollback */
DBCC SHOW_STATISTICS ('dbo.StatsTable', 'IX_SomeNumber');
Statistics remain incorrect after single record rollback

Now this is a little more interesting, our statistics haven’t changed this time, the engine still believes we may have that 11th value present.

A check on the DMV for the statistics show that it was sampled from 1001 records and there has been 1 modification since that time (our rollback):

SELECT s.[name], p.rows, p.steps, p.modification_counter
FROM sys.stats s
	CROSS APPLY sys.dm_db_stats_properties(s.[object_id], s.stats_id) p
WHERE s.[object_id] = OBJECT_ID('dbo.StatsTable')
	AND s.[name] = 'IX_SomeNumber';
Statistic details showing a single record has been modified after rollback

We can see that these statistics are being used when querying the data too, if we look at the estimated plan for our grouping query the number of rows expected to be returned are 11:

In this instance we did’t get the statistics rolled back as the updating isn’t specifically tied to the transaction but number of modifications to the data.

Wrap up

Here we’ve looked at examples to show how statistics can behave when we have transactions modifying larger and smaller sets of data, and them being rolled back.

The reason that our statistics were regenerated when we rolled back the larger data set was due to the volume of modifications. The cost of updating statistics following a rollback would likely be outweighed by queries using stale data for row estimation, however this could also lead to execution plans becoming invalid and needing to be recompiled too.

When we’re only dealing with a small number of changes, and even if statistics are updated within a transaction, there’s no guarantee that they’ll also be updated again when the transaction is rolled back. This could lead to plans being generated based on data which no longer exists.

This isn’t a situation which I’ve seen impact a live workload however I thought it was an interesting question to pose and look into. Have you seen this issue within your own workloads?

2 replies on “Impact on Statistics of Rolling Back Transactions”

Leave a comment