String parsing is a regular occurrence when it comes to data engineering. Splitting strings, converting, combining – all in an effort to bring some order. Sometimes we need a more powerful tool, and that would be Regular Expressions (Regex).
Regular expressions are something I have a fondness for under the right conditions, and with them being introduced in SQL Server 2025 I wanted to dive in and take a look.
What are regular expressions
A Regular Expression defines a pattern which text is compared against to see if it matches. They’re extremely helpful for data validation or pattern matching, for example checking if an email is valid. At first glance these patterns can appear complex as they contain notation through various symbols.
As an example, if you wanted to search for a year within some text, you might use the pattern \d{4} which looks for 4 digits. If we wanted to be more specific and limit to 19** or 20** we could change this to (19|20)\d{2} which defines 19 or 20, followed by 2 digits.
Let’s look at something a little more complex. Here’s one option for validating an email address:
^[a-zA-Z0-9_.]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$
Ok that’s a lot right. Let’s break it down:
^– start of string[...]– group defining options for a single charactera-z– range of characters (a, b, c… y, z), likewise for A-Z, 0-9_.– list of individual characters which are allowed+– matches the previous token (the[...]group) one or more times; i.e. multiple characters@– an explicit character to match, as it’s not contained in a group (likewise for.)\.– another explicit character, however some need escaping as they’re used for notation (.matches any single character)$– end of string
☝️ This is where things get trickier – understanding the syntax to be sprinkled throughout your pattern to get the right results. There are a lot of options which lend themselves to very powerful results.
To help remedy this confusion, there’s plenty of support available online. You can use sites like RegexLearn or RegexOne to walk through their tutorials. Alternatively if you’re crafting a pattern you could check out RegExr or Regex101 which include community samples along with highlighting and tooltips whilst you build your expression.
If you’re in a pinch and don’t want to trawl the syntax, AI can also build some pretty slick expressions given the well defined and explicit nature of them. A bit of guidance and a few examples will certainly start you on the right footing.
Now that we understand what a Regular Expression is, let’s see how we can make use of them with the new regex support in SQL Server 2025.
What functions are available in SQL Server 2025
As part of the SQL Server 2025 preview, there are 7 functions which have been introduced to work with regex patterns in a variety of ways:
REGEXP_LIKE– aLIKEfunction returning true or false based on matching to a regular expressionREGEXP_REPLACE– replace a portion of a string which matches an expression with another stringREGEXP_SUBSTR– extracts the portion of text which matches the regular expression. If there are multiple, only the first is takenREGEXP_INSTR– finds the character at which a pattern matches in the string, similar toCHARINDEX(but parameters are in the opposite order)REGEXP_COUNT– the number of times a given pattern can be matched within the textREGEXP_MATCHES– similar toREGEXP_SUBSTRbut this is a table value function which returns all matches for the specific pattern, with positional detailsREGEXP_SPLIT_TO_TABLE– another table value function, similar toSPLIT_STRINGbut with a regular expression instead of delimiter
Here I’ll pick out a few examples which I think are particularly interesting. If you want to understand any of the regex used in more detail, check out those links above which will do a better job than I would in a few words.
Firstly we’ll use a version of the email validation from earlier to anonymise the domain from the address by replacing with static asterisks:
SELECT REGEXP_REPLACE(
ContactDetails,
'([a-zA-Z0-9_.]+)@([a-zA-Z0-9]+)\.([a-zA-Z]{2,})',
'\1@*****.\3')
FROM dbo.Customers;
/*
From: 'Email to regex.test@mydomain.net'
To: 'Email to regex.test@*****.net'
*/
Next up, we’ll use the StackOverflow database to find which users emphasise their point the most by finding the ones with the most upper case words in their responses (he who shouts loudest, and all that):
SELECT OwnerUserId,
SUM(REGEXP_COUNT(
Body,
'\b[A-Z]{2,}\b'
))
FROM dbo.Posts
WHERE PostTypeId = 2
GROUP BY OwnerUserId;
Finally, we can find customers with valid UK phone numbers without needing to use legacy functions such as LEN() or TRY_CAST() – and ignoring any spacing – with this expression:
SELECT *
FROM dbo.Customers
WHERE REGEXP_LIKE(
Telephone,
'^\s*(?:\d\s*){11}\s*$');
/*
'1234567890' Not Matched
'12345678901' Matched
'12345678901234' Not Matched
' 1234 5678 901 ' Matched(!)
*/
I think this last example shows some of the true power expressions can bring. We’ve removed any need to chain traditional functions, such as substituting whitespace characters, checking string length, and confirming the result is numeric. All in a 21-character pattern 🔥
I’ll note that all these functions have an optional parameter for a flags string which can be used to indicate case insensitivity (i), multi-line mode (m), allowing . to match new-line characters (s), and case sensitivity (c). If not supplied the value will default to 'c' so case sensitivity is in place by default, hence using patterns including [a-zA-Z].
If you’d like to read more about the regular expression features in the SQL Server 2025 preview, check out this article on Microsoft Learn.
Wrap up
I have a fondness for regular expressions, especially when there’s a remotely complex parsing process which could be vastly simplified with their usage. Their syntax can look daunting at first sight but once you’ve found a couple of use cases you’ll see opportunities everywhere.
In this post we’ve looked at what regular expressions are, and then moved on to see the functions available in SQL Server 2025 – with a few examples. Whilst they may not be as exhaustive as in frameworks like .NET, they cover a broad number of use cases; of which we’ve demonstrated a few.
There’s a degree of art to creating patterns but there’s plenty of support out there to help. If you try to use an invalid pattern with the function, they’ll throw up errors. However if you use patterns which technically valid but completely impractical, you’ll end up with a whole lotta nothing returned. Thankfully the same places which help craft the patterns have options for testing them too.
As ever, new features come with new quirks. If you’re planning to introduce these into your workload it’s worth keeping an eye on performance, and gotchas such as SARGability for REGEXP_LIKE – only in the cloud.
Overall though, Regular Expressions bring more flexibility to help us do what we need and ditch some of the legacy string manipulation we’ve wrangled with for decades. Wielding these expressions can be formidable in bringing order to the chaos of string manipulation. If all else fails and Regex can’t deliver, just give it to AI, it’ll be taking over soon anyway.
One reply on “Exploring Regular Expressions in SQL Server 2025”
[…] Steve asks us about our favourite string parsing routines. It conveniently popped up as I was trying out the new Regular Expression (Regex) functions in SQL Server 2025. I adore them for handling string validation and parsing, which I’ve historically done in […]