We store a wide variety of data within our platforms, some of which will inevitably be string based. Along with that comes a need to perform manipulation of those strings. When slicing, combining, or manipulating strings one element which we need to consider is the length of the string.
Here I wanted to have a look at two functions for retrieving the length of a string and show considerations when using those approaches, as well as how they compare to each other.
LEN function
The go-to when grabbing the length of a string is using the LEN function. This will typically provide us with the length of a string which we have, for example:

That’s it, done. Right?
Well, not exactly. You see there’s this thing with LEN which I forget about and comes back to bite me now and then. Let’s try the same test again but with a couple of spaces added either side:

I’ve added 2 spaces at the start and another 2 at the end. How come the length has only changed from 11 to 13 and not 15? There’s a very specific point in the documentation which covers it:
Returns the number of characters of the specified string expression, excluding trailing spaces.
I don’t know why this only excludes trailing spaces and not leading ones, that feels a little odd. But it does, so what option could we use to avoid that trap?
DATALENGTH function
Another function we have to check the length of a string is the DATALENGTH function which you may have seen me use in last week’s post for recursive functions since we were dealing with trailing spaces.
So how does this handle leading and trailing spaces compared to the LEN function:

That’s much better for us. The DATELENGTH function doesn’t trim any spaces from our string so we can get an accurate length. Well, sort of.
If we try to do the same again but this time we use a Unicode string we’ll get something a little different:

Again the online documentation tells us everything we need to know about this function:
This function returns the number of bytes used to represent any expression.
This is the reason I don’t see much call for the DATALENGTH function as it’s actually telling us the length of the string in terms of the number of bytes it’s taking to store the value. We can also see that when we’re using a fixed length string:

What about LEN function
Looping back to the LEN function briefly here I thought it was worth showing how that function operates when we’re using Unicode and fixed length data types:

We still have the challenge of the function trimming the trailing spaces from the strings as we saw earlier, however we have results more consistent with what we’re expecting for a string length as opposed to how the DATALENGTH function treats these strings.
So is there an answer to get the correct length for any and all strings?
Comparison
Unfortunately there really isn’t a one size fits all option here. We’ve got 2 very different functions which can give us exactly what we’re after under specific conditions.
We have the LEN option which gives us consistent and broadly expected lengths for our strings, however we need to be mindful not to get tripped up with trialing spaces being removed.
The DATALENGTH function doesn’t trim any characters from our string however it returns for us the number of bytes needed to store the string. This means we can get accurate lengths for our VARCHAR string, however if we’re using Unicode or fixed length strings the function isn’t appropriate for a character length.
There are options to get the functions working in our favour – for example you could replace spaces with another character before using the LEN function but that will get a little bit of string manipulation very bloated very fast.
My personal preference with strings is to stick with the LEN function under most conditions, with a But. That But is that I like to TRIM strings before storing them in the database to avoid any ambiguity in the data. It keeps the data cleaner and more consistent so unless there’s a really good reason not to, I prefer TRIMmed.
TL;DR
As we’ve got a conclusion above I thought I’d summarise the points made here to wrap up with:
- Length functions are commonly used in SQL Server, we have a couple of options
LENfunction can catch you out by trimming trailing spacesDATALENGTHfunction will give an accurateVARCHARlength with trailing spacesDATALENGTHprovides byte counts forCHAR,NCHARandNVARCHARtypesLENfunction provides accurate results forCHAR,NCHARandNVARCHAR, excluding trailing spaces- Could try to manipulate functions such as
REPLACE(@MyString, ' ', '_')but makes code very bloated - Try to avoid data with leading / trailing spaces such as using
TRIMbefore storing them