Using a computed column in a table allows you to have a calculation pre-baked into your schema. They also allow you to choose if you’d like that calculated value to be physically stored within the table (persisted) or to be calculated each time you query the table. To demonstrate this we’ll use a small sales table below:
CREATE TABLE #ComputedColumns
(
SalesDate DATE,
ProductID INT,
Quantity INT,
Cost DECIMAL(6, 2)
);
INSERT INTO #ComputedColumns (
SalesDate, ProductID, Quantity, Cost
)
VALUES ('2021-01-01', 1, 2, 1.99), ('2021-02-01', 3, 1, 0.99),
('2021-03-01', 4, 120, 0.10), ('2021-04-01', 2, 9, 128.50),
('2021-05-01', 2, 18, 75.00), ('2021-06-01', 4, 44, 11.99),
('2021-07-01', 3, 19, 1.00), ('2021-08-01', 1, 12, 1.01),
('2021-09-01', 1, 2, 35.00), ('2021-10-01', 3, 26, 18.50),
('2021-11-01', 4, 5, 29.99), ('2021-12-01', 2, 102, 1050.00),
('2022-01-01', 3, 11, 1.10);
Creating a Computed Column
The syntax for creating a computed column is similar to adding a regular column for a table however rather than specifying a data type we’ll be specifying a calculation for the column, for example:
ALTER TABLE #ComputedColumns
ADD TotalCost AS Quantity * Cost;
SELECT * FROM #ComputedColumns;
This new column is now part of our table, its available for results via a SELECT statement, used to filter in a WHERE clause, or to sort the results in an ORDER BY. By default this data is not persisted so doesn’t require any extra storage to have in place and the logic is clear and defined in the schema, great. We can also use non-deterministic calculations as part of these too, so another option for this table could be:
ALTER TABLE #ComputedColumns
ADD IsRecentSale AS CASE
WHEN DATEDIFF(MONTH, SalesDate, GETDATE()) < 3
THEN 1 ELSE 0 END;
SELECT *
FROM #ComputedColumns
WHERE IsRecentSale = 1;
As this value isn’t stored within the database and calculated each time we run the query, this can return different results as time moves forward. It could be used in a calendar table to indicate a current week or period which wouldn’t need any external process to update each day.
There are limitations as to how complex these calculation statements can be so whilst you can refer to columns in the same record and call on various functions, you aren’t able to use sub-queries in the statement as one example. As with most things in SQL Server there are more details on adding a computed column via ALTER TABLE statement which can be found here.
Defining the Data Type
Within the definition for a computed column there is nowhere to explicitly define the data type you wish to use. When you create a computed column SQL will automatically use data type precedence to determine which type to make the column based on the types used in the calculation. You can take a look at the data types for our table here:
SELECT c.[name] [Column], t.[name] [DataType], c.[precision], c.scale
FROM tempdb.sys.columns c
INNER JOIN tempdb.sys.types t ON c.system_type_id = t.system_type_id
WHERE OBJECT_NAME(object_id, DB_ID('tempdb')) LIKE '#ComputedColumns%';
For our TotalCost column, we’re referencing DECIMAL as well as INT columns so the one with higher precedence is used: the DECIMAL. Additionally the engine will determine an appropriate precision and scale for the resulting type based on the precision and scale of the fields used in the calculation. In terms of how exactly the new precision and scale is determined you can read the documentation here but essentially it creates a size which tries to accommodate calculations based on the largest values of the source data types. If we want to explicitly override it we can add a cast to the column definition:
ALTER TABLE #ComputedColumns
ADD TotalCostSmall AS
CAST(Quantity * Cost AS DECIMAL(9, 4));
/* Now we have an issue... */
SELECT * FROM #ComputedColumns;
The issue we have here is that we didn’t specify a data type large enough to store the result for at least one of our records. As the field is only metadata there are no checks when its added, so when it gets calculated dynamically on the next query it only the finds out that it can’t store the result at that point – and so throws an exception. This is one example where you may not want to be overwriting the default recommendation.
Similarly to the example above you can also see that our table has what appears to be a Boolean field using an INT rather than BIT data type as we keyed in integer values for the resulting numbers. Again as above we can perform a CAST on these to change the resulting data type as desired. In this case as we’re explicitly defining the returning values we can be more confident in casting the result.
ALTER TABLE #ComputedColumns
ADD IsRecentSaleBit AS CAST(CASE
WHEN DATEDIFF(MONTH, SalesDate, GETDATE()) < 3
THEN 1 ELSE 0 END AS BIT);
Persisting a Computed Column
In the previous example we’ve used the approach for creating a computed column which is only a metadata operation and lets them be calculated dynamically on each execution. There is a different approach whereby the values are physically stored in the database – or persisted. Here we add the extra PERSISTED keyword into the statement which will actually write the data to disk:
ALTER TABLE #ComputedColumns
ADD TotalCostPersisted AS Quantity * Cost
PERSISTED;
When this command is ran it’ll add the column into the schema as well as populating the pre-calculated value into the new field. As this is updating every record in the table this may run for a long time causing excessive writes and blocking on the table until its complete.
One advantage we do see with the persisted column is that it’ll fail to execute if its unable to calculate the value for the new field which wasn’t caught in our earlier example. In that case then you’ll see a failure like with the following:
ALTER TABLE #ComputedColumns
ADD TotalCostSmallPersisted AS
CAST(Quantity * Cost AS DECIMAL(9, 4))
PERSISTED;
If we didn’t have those records in the data with large values causing the overflow then this statement could complete and add the column and populate the new field. Whilst that sounds obvious – we now have a danger that any INSERT statements against the table which have overly large values for the fields can now fail as the computed column won’t be able to populate correctly, so you can cause ongoing issues by overwriting the data type.
As a Bonus
Just a little bonus if you’re feeling bored and are wondering if anyone is sitting around running SELECT * queries from your table…
ALTER TABLE #ComputedColumns
ADD DontSelectStar AS 1 / 0;
Ok, so maybe don’t do this in production – I know you might be really tempted to, but in the back of your mind you also know there’s an app somewhere that’ll be making wildcard queries against your database, because that’s just how they roll.
2 replies on “Introduction to Computed Columns”
[…] In a previous post we covered some of the fundamentals about computed columns and we saw the benefits which they can bring for us. Once we’ve implemented them, one of the challenges we may have is if we need to make adjustments to their expression or the fields they reference. […]
[…] the previous introduction post we covered how to add a computed column to a table and that by default this becomes part of the […]