Categories
SQL Server

Managing Table Partitions (Part 1)

Last time out we tried our hand at implementing partitioning on a table with some data inside it. We looked at creating the partition function, followed by the partition scheme, and then applied that to our table and looked at the results. Following up on that I now wanted to look at how we can manage the partitions in that table which we’ve created.

Generally as the data which we consume increases our requirement for managing partitions is to add new ones for the extra data we need to store. Here we’ll look at that but also consider how we can split existing partitions, and in a subsequent post also see how to remove partitions and some considerations when managing them.

We’ll use the table from our previous post as the starting point for the exercises we’re undertaking here. The combined script for that can be found below:

/* Create a partition function to use */
CREATE PARTITION FUNCTION MyPartFunc (DATE)
AS RANGE RIGHT
FOR VALUES ('2000-01-01', '2001-01-01', '2002-01-01', '2003-01-01', '2004-01-01', '2005-01-01', '2006-01-01', '2007-01-01', '2008-01-01',
    '2009-01-01', '2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01',
    '2017-01-01', '2018-01-01', '2019-01-01', '2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01', '2024-01-01');

/* Create a partition scheme for the function */
CREATE PARTITION SCHEME MyPartScheme  
AS PARTITION MyPartFunc  
ALL TO ([PRIMARY]);

/* Create the table */
CREATE TABLE SalesData (
    ID INT IDENTITY(1, 1),
    SalesDate DATE,
    SalesQuantity INT,
    SalesValue DECIMAL(9, 2)
);

/* Apply clustering key */
CREATE CLUSTERED INDEX cx_SalesDate
ON dbo.SalesData (SalesDate)
ON MyPartScheme (SalesDate);
 
/* Populate 100k random records */
WITH RandomData AS (
 
        SELECT 1 [Counter],
            DATEADD(DAY, CAST(RAND(CHECKSUM(NEWID())) * 10000 AS INT), '1999-01-01') [SalesDate],
            CAST(RAND(CHECKSUM(NEWID())) * 50 AS INT) [SalesQuantity],
            CAST(RAND(CHECKSUM(NEWID())) * 1000 AS DECIMAL(9, 2)) [SalesValue]
    UNION ALL
        SELECT [Counter] + 1,
            DATEADD(DAY, CAST(RAND(CHECKSUM(NEWID())) * 10000 AS INT), '1999-01-01'),
            CAST(RAND(CHECKSUM(NEWID())) * 50 AS INT),
            CAST(RAND(CHECKSUM(NEWID())) * 1000 AS DECIMAL(9, 2))
        FROM RandomData
        WHERE [Counter] < 100000
)
INSERT INTO SalesData (SalesDate, SalesQuantity, SalesValue)
SELECT SalesDate, SalesQuantity, SalesValue
FROM RandomData
OPTION (MAXRECURSION 0);

Splitting a partition

If we take a look at the partition function we’ve defined here it only extends to January 2024. Any data we have from that date onwards will fall into our final partition. Let’s have a quick look shall we:

SELECT DATEPART(YEAR, SalesDate) [Year], COUNT(1) [Records]
FROM dbo.SalesData
WHERE SalesDate >= '2024-01-01'
GROUP BY DATEPART(YEAR, SalesDate);

Due to the way we created the data we’re intentionally putting more than we need to into that final partition – data from 2025 and 2026 too. As a result of that we want to look at splitting the data out of that partition into a couple of new ones.

To add a new partition to the data we don’t need to run any operation on the table itself. We make a change to the underlying partition function and the result of this is that the table will automatically be repartitioned and the data split out.

We can take a look at the number of partitions and records in each of them before we make the changes with a simple query below:

SELECT partition_number, [rows]
FROM sys.partitions
WHERE OBJECT_NAME(object_id) = 'SalesData';

Currently there are roughly 3600 records in most partitions however the final partition has around 8500 due to us having data from multiple years in there.

To make the change to our function we’d call the following command to add an extra partition which would contain data from 2025 onwards:

ALTER PARTITION FUNCTION MyPartFunc()
SPLIT RANGE ('2025-01-01');

With that operation complete we can run our previous query and see an additional partition has been created and the bulk of records from the previously-final partition (2024) have been moved into our newly-final one (2025 onwards).

At this point it’s worth noting that both of the partitions have to be written to during this operation – the previous one needs to have records removed and the new one has to have all of the records added. Due to this splitting a partition can be a resource intensive operation so appropriate timing should be considered in production environments.

Splitting another partition

As we saw previously we also have data from 2026 in our table so we should split our 2025 partition once again to accommodate this data too. Let’s go ahead and try that in the same way as before:

ALTER PARTITION FUNCTION MyPartFunc()
SPLIT RANGE ('2026-01-01');

Warning: The partition scheme ‘MyPartScheme’ does not have any next used filegroup. Partition scheme has not been changed.

Well that wasn’t the result we were looking for!

When we create a partition scheme for our function we declare which filegroup/s we want to put our partitions into. As part of this creation the scheme will set a NEXT USED property for the next filegroup to be used when creating a new partition.

However once the next partition has been created we’ll need to tell the partition scheme where to set the NEXT USED filegroup to be before we can create another partition. We’ll continue to use our PRIMARY filegroup for our partitions so we’ll run the following:

ALTER PARTITION SCHEME MyPartScheme  
NEXT USED [PRIMARY];

With that in place we can now get back to splitting our partition for the 2026 data and finally get each of our years of data into their own partitions:

ALTER PARTITION FUNCTION MyPartFunc()
SPLIT RANGE ('2026-01-01');

Adding new partitions

With the existing data being split down I now want to turn to adding extra partitions to the function.

When dealing with partitions we typically end up adding extra partitions into our functions to to cater for new data coming in. This is also usually done as an activity ahead of time to avoid having to split a partition which would require modifying the contents of two partitions.

Managing partitions in this way would be a maintenance activity and should be part of an appropriate maintenance cycle, for example prior to the start of a new year in our case.

As any unused partitions will simply sit there empty it can be common to see future partitions created in advance and left for when they’re needed to stay well away from any chance of needing to split in the future. I’ve seen it done with partitions anywhere from 1 to 10 years ahead of time to be safe.

In order for us to add more partitions its exactly the same as how we split the partitions. In our case we’ll add another 3 years into our function. We’ll also need to set the NEXT USED for the scheme as we just saw each time too:

/* Add the partition for 2027 */
ALTER PARTITION SCHEME MyPartScheme NEXT USED [PRIMARY];
ALTER PARTITION FUNCTION MyPartFunc() SPLIT RANGE ('2027-01-01');

/* Add the partition for 2028 */
ALTER PARTITION SCHEME MyPartScheme NEXT USED [PRIMARY];
ALTER PARTITION FUNCTION MyPartFunc() SPLIT RANGE ('2028-01-01');

/* Add the partition for 2029 */
ALTER PARTITION SCHEME MyPartScheme NEXT USED [PRIMARY];
ALTER PARTITION FUNCTION MyPartFunc() SPLIT RANGE ('2029-01-01');

With those extra partitions created we’ve now given ourselves some growth room for a few years of extra data to come in. We could maintain this each year to ensure we’re always 2-3 years ahead of our current data.

Wrap up

Here we’ve looked at adding new partitions into our data – whether that be by splitting existing partitions or by adding new ones to the end of our partitions. We also covered the fact that splitting partitions can be an expensive operation and that we’d want to be considerate when we perform this operation. Ideally we’d like to get the partitions set up in advance with regular maintenance to provide available partitions to grow into.

I’ll be looking to add a Part 2 as a follow up to this post where we’ll look at removing (or merging) partitions as well as covering some general considerations around dealing with partitioning and maintaining them.

Leave a comment