Defaulting values is a typical part of data ingestion and processing within SQL. Handling NULL values can cause issues in a number of ways so there are time when we want to get these out of our data. Recently I had a question about setting default values for a field which was being processed as a NULL value and what the options were.
The Scenario
To set the stage I’m going to create a Users table and populate it with some data – with the AccessLevel for one of the records set as a NULL value:
CREATE TABLE #Users (
Username VARCHAR(50) NOT NULL,
AccessLevel INT NULL
);
INSERT INTO #Users (Username, AccessLevel)
VALUES ('Andy', 9), ('Paul', 0),
('Lisa', 5), ('Lizzie', NULL);
SELECT * FROM #Users;
| Username | AccessLevel |
|---|---|
| Andy | 9 |
| Paul | 0 |
| Lisa | 5 |
| Lizzie | NULL |
Now we’ll create a Destination table where we want to import this data into, and a procedure to copy the data over, nothing too fancy:
CREATE TABLE #Destination (
ID INT IDENTITY(1, 1),
UserName VARCHAR(50) NOT NULL,
AccessLevel INT NULL
);
GO
CREATE OR ALTER PROCEDURE #ImportUsers AS
BEGIN
/* Add new users */
INSERT INTO #Destination (Username, AccessLevel)
SELECT u.Username, u.AccessLevel
FROM #Users u
LEFT JOIN #Destination d ON u.Username = d.UserName
WHERE d.ID IS NULL;
END
So lets make a start by running the procedure and looking at the table values afterwards:
EXEC #ImportUsers;
SELECT * FROM #Destination;
| ID | Username | AccessLevel |
|---|---|---|
| 1 | Andy | 9 |
| 2 | Paul | 0 |
| 3 | Lisa | 5 |
| 4 | Liz | NULL |
This is giving us a direct copy of the data and what we want to try and do is to remove that NULL value which is popping into the AccessLevel field.
Adding a Default Value
The first approach which was mentioned was adding a default constraint to the table for that field, but let’s see what happens if we try that. We’ll remove the existing data, add in the constraint and then run the import again:
DELETE FROM #Destination;
ALTER TABLE #Destination
ADD CONSTRAINT df_AccessLevel
DEFAULT (0) FOR AccessLevel;
EXEC #ImportUsers;
SELECT * FROM #Destination;
| ID | Username | AccessLevel |
|---|---|---|
| 5 | Andy | 9 |
| 6 | Paul | 0 |
| 7 | Lisa | 5 |
| 8 | Liz | NULL |
Oh, we’ve got the same result? Yep.
Default values don’t stop values being explicitly inserted into those fields, they’ll only default the value if the field isn’t specified as part of the INSERT statement. As a test for this we’ll insert just the Username into the table and see what happens:
INSERT INTO #Destination (UserName)
VALUES ('NullTest');
SELECT * FROM #Destination;
| ID | Username | AccessLevel |
|---|---|---|
| 5 | Andy | 9 |
| 6 | Paul | 0 |
| 7 | Lisa | 5 |
| 8 | Liz | NULL |
| 9 | NullTest | 0 |
Now our new test record has correctly had the default applied to it as we’d have liked.
Default values are great for defaulting data when all of the new records being added need to have the value set, for example if you had a user with a Last Access timestamp and maybe you want to default new records to a particular point in time. If there are only some of the records which need the field to be defaulted then we want to tackle that in a different way.
Defaulting a Value
An alternative approach which would be more appropriate in the example of this procedure is with the use of the ISNULL method to replace any NULL values with one explicitly stated in the query. Lets update the procedure to make use of this:
CREATE OR ALTER PROCEDURE #ImportUsers AS
BEGIN
/* Add new users */
INSERT INTO #Destination (Username, AccessLevel)
SELECT u.Username, ISNULL(u.AccessLevel, 0)
FROM #Users u
LEFT JOIN #Destination d ON u.Username = d.UserName
WHERE d.ID IS NULL;
END
Here we’ve specified that any NULL values should be replaced with a default of 0. With that in place lets clear the data and try the process again:
DELETE FROM #Destination;
EXEC #ImportUsers
SELECT * FROM #Destination;
| ID | Username | AccessLevel |
|---|---|---|
| 10 | Andy | 9 |
| 11 | Paul | 0 |
| 12 | Lisa | 5 |
| 13 | Liz | 0 |
Success, we’ve got that value defaulted as we’d hoped! (it’s the small wins, eh)
I’ll just add a note at this point that if you were doing this type of approach on a production system it may not be feasible to truncate and reload all of the data just to apply a default. In that case it may also be worthwhile defaulting existing records and changing the column to be NOT NULL to avoid any other erroneous records creeping in:
/* Put the procedure into place first to avoid invalid new records */
/* Manually apply the default to existing records */
UPDATE #Destination
SET AccessLevel = 0
WHERE AccessLevel IS NULL;
/* Now set the column to be NOT NULL */
ALTER TABLE #Destination
ALTER COLUMN AccessLevel INT NOT NULL;
Wrap Up
Both of these approaches are great in the right use cases and will likely be used frequently if you’re involved with data ingestion or manipulation.
Applying default values are good for applying to all new records where you don’t need to specify that field as part of the INSERT statement. They can also be handy if you want to extend an existing table with a new column and by applying a default value to it you wouldn’t need to update any existing statements which are adding data to the table.
Typically when importing data we can find a lot of sources where the fields could be NULL values so the ISNULL method can be heavily used in those scenarios. As this is contained in the query too, it’s also really clear to see what’s happening with the inserts/updates at a later point when someone else might revisit the code.