Categories
SQL

Numeric Data Validation

Data validation is key when ingesting from external sources. As we can’t always be certain of data quality we inevitably find bad data which needs to be handled. Here I wanted to look at a couple of options for validating numeric data.

Here’s the scenario – we’ve got data which may have been received via a flat file or passed into our database, and it should be a numeric value. How can we weed out the valid from invalid data?

Let’s create some sample data to start with:

CREATE TABLE #Numerics (
	ID INT IDENTITY(1, 1) PRIMARY KEY,
	[Value] VARCHAR(50)
);

INSERT INTO #Numerics ([Value])
VALUES (NULL), (''), (' '),
	('0'), ('-1'), ('1'), (' 1'), ('2 '),
	('123456'), ('1,234,567,890'), ('12345678901234'),
	('$'), ('£1.23'), ('One'), ('-1.23E45');

Ole faithful

One of the tried and tested ways to validate if a value is numeric is the built in SQL Server function ISNUMERIC(). It’s been around forever and sometimes seen as a go-to, but we’ll challenge that.

The online documentation describes its functionality as:

Determines whether an expression is a valid numeric type.

That is to say, whether the value could be converted to one of the numeric data types successfully. This can be one of the challenges with the function. It covers a broad range of numeric data types. But it doesn’t tell us which.

Let’s look at how that handles our sample data:

SELECT [Value],
	[IsNumeric] = ISNUMERIC([Value])
FROM #Numerics;
Results of sample data being ran through ISNUMERIC function

Looking down those results I think we can agree that it broadly makes sense. Even the ‘$’ on its own could be assumed as a zero value so let’s go with it.

Now we face a problem. What type of numeric value are they?

We can try to parse the results which return as numeric, but there isn’t actually a common data type between them:

/* Fails due to '1,234,567,890' */
SELECT CAST([Value] AS INT) FROM #Numerics WHERE ISNUMERIC([Value]) = 1;

/* Fails due to ' ' */
SELECT CAST([Value] AS DECIMAL) FROM #Numerics WHERE ISNUMERIC([Value]) = 1;

/* Fails due to '-1.23E45' */
SELECT CAST([Value] AS MONEY) FROM #Numerics WHERE ISNUMERIC([Value]) = 1;

In most situations we’ll have a reasonable idea of the data we should be seeing, however that isn’t always the case. The function only returning a Boolean value isn’t really up to the task we have at hand.

Modern approach

I hesitate to call this a modern approach as it’s been around since 2012, but here we are. In SQL Server 2012 there were a number of TRY_ functions added to extend existing features. Here we’ll look at using the TRY_CAST() function as an alternative to ISNUMERIC above.

The online documentation describes this function as follows:

Returns a value cast to the specified data type if the cast succeeds; otherwise, returns null.

Right off the bat we can expect two differences between this and our previous function:

  • This will return the value rather than a Boolean indicator
  • We can specify the data type to be converted too

They’re considerable differences to the the way ISNUMERIC works and should make the results much more usable.

We’ll start with the parsing of the values from our sample data to compare the results. For this we’ll choose a FLOAT data type to capture a broad range of values:

SELECT [Value],
	[IsNumeric] = ISNUMERIC([Value]),
	TryCast = TRY_CAST([Value] AS FLOAT)
FROM #Numerics;
Comparison of results from ISNUMERIC function with TRY_CAST function

Those results are a little different aren’t they.

We can see no issues with leading or trailing spaces, however empty string are still parsed as zero which differs to the ISNUMERIC result. Further down we can see that separators and currency symbols are not being parsed, however scientific notation is.

Is this better or worse? – I guess that depends on your requirements.

In most situations this approach would be preferable due to:

  • The function returning an actual result
  • Data typing is defined ahead so no guess work
  • Values which fail to parse can be identified rather than failing the statement

Overall I’d consider this a better alternative as it is much more robust and consistent. Whilst it may not be a direct replacement if you see any of the edge cases where it differs to ISNUMERIC results I’d prefer this option for new developments.

Wrap up

In this post we’ve looked at the ISNUMERIC and TRY_CAST functions and how they differ in processing text to identify potential numeric values.

Part of the reason I decided to collate this list was coming across some not-so-old code which was still relying on ISNUMERIC() function for checking the data passing through.

For catching some basic errors such as text dropping into a numeric column, the ISNUMERIC function may be sufficient. When we start working with more complex data, particularly numeric values which may be formatted (e.g. currency, scientific notation) the newer TRY_ functions can shine.

We’ll expand on this next time by looking at a couple of other TRY_ functions which are available. We’ll see see how they compare to ISNUMERIC and TRY_CAST, and what alternatives they might provide.

2 replies on “Numeric Data Validation”

Leave a comment