Sometimes you find a large table where data is building up and the history is getting less relevant by the day. The size might be consuming valuable storage, increasing your backup window, maintenance window, or query run times (everyone uses a WHERE clause, surely?). It might have been around for a while and the creator didn’t have anything built in to support pruning or shifting out the old data and the tables aren’t partitioned so you can’t switch them out either.
What you need is a way to get some data out of that table and ease those pressures and to keep things under control.
One of the options available to us is to craft a DELETE statement to remove the data we don’t need and use the OUTPUT clause to drop this into another table. This executes as a single statement so your data will either be in your source or destination table, you’ll only have one copy of your data before and after. A very brief example below:
DELETE [dbo].[MyFullTable]
OUTPUT
Deleted.Id,
Deleted.EventTimestamp,
Deleted.EventDetails,
Deleted.SomeFkId
/* etc... */
INTO [archive].[MyBigTable] (
Id,
EventTimestamp,
EventDetails,
SomeFkId
)
WHERE
EventTimeStamp < '2020-01-01';
This is built assuming you’ve created your new table to store the data – using the same data types for sure, but you may well be able to do without any foreign keys or indexes used in the operational tables, or may choose to compress this if you’re looking to save more space.
Similar to a regular deletion this may well result in blocking if you’re trying to take too much data out in one go so its also helpful to batch the process. This batching can be expanded upon to limit the amount you want to remove. The benefits of this method include:
- Limiting the amount removed can avoid unwanted log file growth
- Small amounts can be nibbled off more frequently to slowly work through the backlog
- You can choose the amount to be removed at different times or for different tables depending on maintenance windows and table sizes
- If its interrupted after a period of time it’ll only be that one batch which needs to roll back
So with all that said, how does that look in practice?
/* Define a retention period for the data */
DECLARE @RetentionPeriodInMonths INT = 12;
DECLARE @ArchiveFromDate DATE = DATEADD(MONTH, -(@RetentionPeriodInMonths), GETDATE());
/* Initialise variables for the looping */
DECLARE @LoopCount INT = 0,
@RecordsDeleted INT = 1,
@MaxIterations INT = 50,
@BatchSize INT = 1000;
/* Keep looping as long as we dont exceed limited iterations and we're still removing data */
WHILE (@RecordsDeleted > 0 AND @LoopCount < @MaxIterations)
BEGIN
/* Remove the batch from the table */
DELETE TOP (@BatchSize) [dbo].[MyFullTable]
OUTPUT
Deleted.Id,
Deleted.EventTimestamp,
Deleted.EventDetails,
Deleted.SomeFkId
INTO [archive].[MyBigTable] (
Id,
EventTimestamp,
EventDetails,
SomeFkId
)
WHERE
EventTimeStamp < @ArchiveFromDate;
/* Update variables ready for the next iteration */
SET @RecordsDeleted = @@ROWCOUNT;
SET @LoopCount = @LoopCount + 1;
/* Wait and gather our thoughts...
or just let other queries catch up */
WAITFOR DELAY '00:00:01:000';
END
This script will go through the table and remove up to 50,000 records per run. If there are less than this many which need to be removed it’ll drop out early but that’s the limit. It’ll also take a pause after each run to let any waiting queries continue.
These numbers work in this example – in the real world your table might be larger and need more iterations, or it might be busier and need smaller batches or longer waits. The parameters should be tailored for the specific need. It’s also a great candidate for a stored procedure too if you need to run it under different conditions – maybe very small batches during the day, a larger batch overnight and a big chunk during a maintenance window.
If you’re looking to save some space on faster storage or reduce maintenance windows then you could also move the data into a different file group or database with this approach too so that could be worth considering.
This pattern works well for isolated tables however if you have a scenario where the table is the target of a foreign key (such as an order > order line situation) then it isn’t quite as straight forward since you can’t just delete those header records. That might be one for another time…