Categories
SQL

Changing Identity from INT to BIGINT

When creating tables we may choose to use an INT column with an IDENTITY set as a unique clustering value. On occasions we find out that being able to store 2,147,483,647 records just isn’t enough and we need to change this field to be a BIGINT to let us store… well, a lot more.

Here I’ll outline one approach which could be taken to achieve this change. The general process is to create a new table, copy the data over, and then switch the tables. This process also includes a check for data movement and potential rollback steps which can be used if any of the process were to fail.

For reference our initial table can be created with the script below which will create the schema and place 1,000,000 records into it:

CREATE TABLE dbo.MyTable (
	ID INT IDENTITY(1, 1),
	Number INT,
	SmolNumber TINYINT,
	WeekdayName VARCHAR(20),
	CONSTRAINT PK_MyTable PRIMARY KEY CLUSTERED (ID)
);

WITH TestData AS (
		SELECT Counter = 1,
			Number = RAND(CHECKSUM(NEWID())) * 100000,
			SmolNumber = RAND(CHECKSUM(NEWID())) * 255,
			WeekdayName = DATENAME(WEEKDAY, DATEADD(DAY, RAND(CHECKSUM(NEWID()))* 1000 , '2020-01-01'))
	UNION ALL
		SELECT [Counter] = [Counter] + 1,
			Number = RAND(CHECKSUM(NEWID())) * 100000,
			SmolNumber = RAND(CHECKSUM(NEWID())) * 255,
			WeekdayName = DATENAME(WEEKDAY, DATEADD(DAY, RAND(CHECKSUM(NEWID()))* 1000 , '2020-01-01'))
		FROM TestData
		WHERE [Counter] < 1000000
) 
INSERT INTO dbo.MyTable (Number, SmolNumber, WeekdayName)
SELECT Number, SmolNumber, WeekdayName
FROM TestData
OPTION (MAXRECURSION 0);

CREATE INDEX SmolNumber
ON dbo.MyTable (SmolNumber);

Preparation

We’ll start the process by creating a fresh table which will ultimately be our new table. We’ll add something to make this unique, for example ‘_v2’. This will only replicate the main table data, we don’t need to worry about indexes for now:

CREATE TABLE dbo.MyTable_v2 (
	ID INT IDENTITY(1, 1),
	Number INT,
	SmolNumber TINYINT,
	WeekdayName VARCHAR(20),
	CONSTRAINT PK_MyTable_v2 PRIMARY KEY CLUSTERED (ID)
);

Next up we’re going to create a variable to store the identity value for the current table. We’re taking this before copying any data and we’ll be able to use it later to make sure no new records have been added since we started the process:

DECLARE @CurrentIdentity BIGINT;
SELECT @CurrentIdentity = IDENT_CURRENT('dbo.MyTable');

Once that’s done our preparation is complete and we can get on to copying the data over.

Copying the data

To copy the data to the new table we’ll do it in a looped fashion to ease pressure on the transaction log. We’ll continue to loop and batch the data across until we get up to the identity value which we’ve just taken.

We’ll start by defining variables for our counter, the number of records we want to copy in each batch, and the amount of delay between batches in case we’re causing any blocking or pressure on system resources.

As our tables use IDENTITY columns we’ll also need to set IDENTITY_INSERT to be ON for the duration of the data copy. Once the data is inserted we can turn this OFF again. Each iteration we’ll take a chunk of records and insert them into the new table followed by updating our counter, waiting the specified period, and then repeating until we’ve passed the identity value we know was present previously.

DECLARE @Counter INT = -1,
    @RecordsPerBatch INT = 100000,
    @DelayBetweenBatches VARCHAR(10) = '00:00:01';

SET IDENTITY_INSERT dbo.MyTable_v2 ON

WHILE (@Counter < @CurrentIdentity)
BEGIN

    INSERT INTO dbo.MyTable_v2 (ID, Number, SmolNumber, WeekdayName)
    SELECT TOP (@RecordsPerBatch)
        ID, Number, SmolNumber, WeekdayName
    FROM dbo.MyTable
    WHERE ID > @Counter
    ORDER BY ID;

    SELECT @Counter = MAX(ID)
    FROM MyTable_v2;

    WAITFOR DELAY @DelayBetweenBatches;

END

SET IDENTITY_INSERT dbo.MyTable_v2 OFF

Some approaches to this may use a where clause similar to the one below:

WHERE ID >= @Counter AND ID < (@Counter + @RecordsPerBatch)

The above approach would be fine if we’re confident that the IDs are sequential and records haven’t been removed, however when this isn’t the case then our approach of using ID > @Counter with a TOP on the query would be more efficient.

Now that the data has been copied over it’s the right time to add our index back onto the table. We wouldn’t want this as we’re copying the data as it would need to be written to each loop and need to continually shuffle data. By adding at this stage it’ll have all the data in one place and can complete more efficiently:

CREATE INDEX SmolNumber
ON dbo.MyTable_v2 (SmolNumber);

With the data and indexes copied over we should have ourselves a replica of our existing table now using a BIGINT for it’s identity. The last step for us is to swap them around and we’ll be done.

Switching the tables

Our first stage before we rename the tables and complete the change is that we need to check if the identity has moved during the data copy. This could be the case if we’re copying a large amount of data over a period and processes may not have been stopped which interact with the table.

Below we check the identity of our source table and compare it to the identity variable we took previously. If there’s a difference found we’ll raise an error and abort the script:

IF (IDENT_CURRENT('dbo.MyTable') <> @CurrentIdentity)
BEGIN
    RAISERROR(N'Identity value for dbo.MyTable has changed during migration, please rollback', 16, -1);
    RETURN
END

Once that’s complete we can go about renaming the tables. We’ll first take our original table and rename the primary key constraint followed by the table, then we’ll repeat for our new table slotting back in to where the previous table was:

/* Switch the old table out */
EXEC sp_rename @objname = 'PK_MyTable', @newname = 'PK_MyTable_Old', @objtype = 'object';
EXEC sp_rename @objname = 'dbo.MyTable', @newname = 'MyTable_Old', @objtype = 'object';

/* Switch the new table in */
EXEC sp_rename @objname = 'PK_MyTable_v2', @newname = 'PK_MyTable', @objtype = 'object';
EXEC sp_rename @objname = 'dbo.MyTable_v2', @newname = 'MyTable', @objtype = 'object';

This process will leave our old table present in the database with the suffix _Old. If it wasn’t required at this point it could be backed up or dropped as you need.

With that, the process is complete and we’ve got our new table looking the same in terms of schema and contents with our BIGINT as the identity column.

Rollback

One last step in case we have any issues would be to run a rollback script. In the one below you’ll see that we check if our ‘_Old’ table is present (the switch has been completed), in which case it’ll rename the tables back and reinstate the original table. Once that’s complete it’ll then remove the v2 table if that’s present on the environment:

IF EXISTS (SELECT 1 FROM sys.tables WHERE [name] = 'MyTable_Old')
BEGIN
     /* We've switched the tables out, put them back */
    EXEC sp_rename @objname = 'PK_MyTable', @newname = 'PK_MyTable_v2', @objtype = 'object';
    EXEC sp_rename @objname = 'dbo.MyTable', @newname = 'MyTable_v2', @objtype = 'object';
    EXEC sp_rename @objname = 'PK_MyTable_Old', @newname = 'PK_MyTable', @objtype = 'object';
    EXEC sp_rename @objname = 'dbo.MyTable_Old', @newname = 'MyTable', @objtype = 'object';

END

IF EXISTS (SELECT 1 FROM sys.tables WHERE [name] = 'MyTable_v2')
BEGIN
    /* Remove our newly created table if present */
    DROP TABLE dbo.MyTable_v2;
END

Wrap up

Needing to change an identity column isn’t something that we need to do regularly as we typically understand data volumes ahead of time and can plan data typing accordingly, however when we have systems embedded in an organisation over a long period these issues can crop up.

Here we’ve looked at a template for how that can be achieved with additional checks for data movement in the process and how to safely roll back if any issues are encountered. With the potential volumes of data needing to exhaust an INT field it’s worth testing scripts as well as varying batch sizes and delays to find the right balance for your environments.

Is this something you’ve had to do before or often, do you approach it differently, or have you found any gotchas with these changes in your experience?

Leave a comment