When it comes to removing data from a heap its pretty similar to removing it from a regular table:
DELETE TOP (1000)
FROM HeapityHeap
WHERE SomeValue > @Threshold;
That’s all folks!
Wait, or would you like your data pages back too? Yea, it might not be quite that easy…
You see, when you remove a small number of records from a heap you’ll find the size of the table won’t typically reduce. It’s one of those quirks of heaps which can be both a benefit a headache. To illustrate this I’ll take a copy of the Person table in the AdventureWorks2019 database:
/* Create a copy of the table into a new heap */
SELECT *
INTO [Person].[PersonCopy]
FROM Person.Person;
/* How many records do we have? */
SELECT COUNT(*) FROM [Person].[PersonCopy];
/* 19,972 records */
/* Lets see what's inside */
SELECT index_type_desc, alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats(
DB_ID('AdventureWorks2019'),
OBJECT_ID('Person.PersonCopy'),
NULL, NULL, NULL);
/* 3,809 pages of IN_ROW_DATA */
So now that we’ve got a bundle of records in that table what happens if we just try to delete 1,000 of these records?
/* Lets delete a few */
DELETE TOP (1000) FROM [Person].[PersonCopy];
SELECT COUNT(*) FROM [Person].[PersonCopy];
/* 18,972 records */
/* How are things looking now? */
SELECT index_type_desc, alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats(
DB_ID('AdventureWorks2019'),
OBJECT_ID('Person.PersonCopy'),
NULL, NULL, NULL);
/* 3,809 pages of IN_ROW_DATA ?! */
/* How about if we run it a few more times? */
DELETE TOP (1000) FROM [Person].[PersonCopy];
GO 10
SELECT COUNT(*) FROM [Person].[PersonCopy];
/* 8,972 records */
/* ... and now? */
SELECT index_type_desc, alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats(
DB_ID('AdventureWorks2019'),
OBJECT_ID('Person.PersonCopy'),
NULL, NULL, NULL);
/* Still 3,809 pages of IN_ROW_DATA 😦 */
You’ll see the issue here that the pages just aren’t being released after the data is removed from them. In fact if we continued to delete all of the pages from the table in the same way we’d still have those 3,809 pages allocated. Bummer.
More details about this can be found in this article and essentially it boils down to the fact that when removing records from a heap, a table level lock needs to be acquired to ensure the pages are released. If we add a hint for that onto the query:
/* Lets try with a table lock */
DELETE TOP (1000) FROM [Person].[PersonCopy] WITH (TABLOCK);
SELECT COUNT(*) FROM [Person].[PersonCopy];
/* 7,972 records */
/* ... and finally */
SELECT index_type_desc, alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats(
DB_ID('AdventureWorks2019'),
OBJECT_ID('Person.PersonCopy'),
NULL, NULL, NULL);
/* 3,608 pages allocated! */
Notice with this that we’ve only reduced the number of pages allocated from 3,809 to 3,608. The TABLOCK is only releasing the pages involved in this deletion and any unreleased pages from prior delete operations will still be left in place.
If there have been a lot of records already deleted from the table and you’re looking to free up the space then there aren’t too many options to get them released:
- Truncate and reload the table – the truncate operation locks the table so pages can be released
- Rebuild the table – although in doing this you’ll rebuild all non-clustered indexes on the table
- Apply a clustered key to the table – which will be similar to rebuilding the table and all keys on it
In this instance as its a small table we’re using to demonstrate, we can rebuild it to show the pages being released:
/* Rebuild the table to release the pages */
ALTER TABLE [Person].[PersonCopy] REBUILD;
SELECT index_type_desc, alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats(
DB_ID('AdventureWorks2019'),
OBJECT_ID('Person.PersonCopy'),
NULL, NULL, NULL);
/* We're now down to 1,596 pages allocated! */
With all of that being said, at the start I mentioned that there can also be benefits to the way that this works. Even though the pages are empty at the point of deletion, SQL can reuse those pages if the same table needs them again, rather than allocating new pages. This can be a reason why heaps may be good candidates for staging tables since the pages can remain allocated and reused when needed to stage subsequent data rather than continually allocating and deallocating pages frequently.