Categories
SQL

Challenges with Dynamic Data Masking

Recently we’ve been looking at Dynamic Data Masking and what it can achieve for us. With that said however, there are some drawbacks to how it works which its worth being aware of before you consider implementing it.

Even in the official documentation it states that the feature “limits” exposure, can be used to “simplify” design, and it “helps prevent unauthorised access” – notice that the wording is not absolute, this is only part of a wider design and security framework around your data.

With that out of the way let’s take start with a bit of sample data for us to play around with:

CREATE TABLE #Employees (
    ID INT IDENTITY PRIMARY KEY,
    EmployeeName VARCHAR(50)
        MASKED WITH (FUNCTION = 'default()'),
    Salary DECIMAL(9, 3)
        MASKED WITH (FUNCTION = 'random(20, 200)')
);

INSERT INTO #Employees (
    EmployeeName,
    Salary
)
VALUES ('Andy', 100),
    ('Dave', 50),
    ('Steve', 150);

The challenge with masking

When we apply the masking to the data, the underlying data isn’t changed. The masking is metadata which is applied when data is presented to unprivileged users. The challenge which this creates for us is that when its the database engine using the data this will be in its raw (unmasked) state.

As a result of the engine having access to the data, any expressions used when filtering the data for example, would be based on the actual data rather than the masked version. In this example if we were to look at the data as a user with limited privilages we’d see the employee names as xxxx however let’s try the following queries and see what the results would be:

SELECT *
FROM #Employees
WHERE EmployeeName = 'xxxx';

SELECT *
FROM #Employees
WHERE EmployeeName = 'Dave';

Trying to filter by the masked value won’t return anything as the filter doesn’t match the underlying data. You can see however that we’ve been able to find the record for Dave even though we still can’t see that name. If we went further with this example it could allow us to query additional tables based on the ID which we’ve found for that employee.

Let’s see another example

So let’s take this a stage further – what if I wanted to know how much Steve earns or if he might be earning more than me? Well yea we can do that sort of thing too:

/* Who earns more? */
SELECT
	a.Salary [Andy],
	s.Salary [Steve],
	CASE
		WHEN s.Salary - a.Salary = 0 THEN 'Earn the same'
		WHEN s.Salary - a.Salary > 0 THEN 'Steve earns more'
		WHEN s.Salary - a.Salary < 0 THEN 'Andy earns more'
		END [WhoEarnsMore]
FROM #Employees a
	CROSS JOIN #Employees s
WHERE a.EmployeeName = 'Andy'
	AND s.EmployeeName = 'Steve';

/* How much does Steve earn? */
SELECT
	Salary,
	CASE
		WHEN Salary BETWEEN 100 AND 110 THEN '100-110'
		WHEN Salary BETWEEN 110 AND 120 THEN '110-120'
		WHEN Salary BETWEEN 120 AND 130 THEN '120-130'
		WHEN Salary BETWEEN 130 AND 140 THEN '130-140'
		WHEN Salary BETWEEN 140 AND 150 THEN '140-150'
		WHEN Salary > 150 THEN 'More than 150'
		ELSE 'Less than 100' END [SalaryRange]
FROM #Employees
WHERE EmployeeName = 'Steve';

Well that’s a little eye opening isn’t it. This shows us how we can stretch the masking a little further to start teasing out the underlying data. You can also see that the results returned in both cases are representative of our source data rather than the masked values being returned.

Of course the more sensitive data we have in here the more susceptible we’ll be to someone being able to find more than we’d like. For example how much more information could someone gleam if there were an identifier for the line manager in the table – you could find a certain person’s record and then have visibility of a whole team with very little effort.

One step further

It should be noted that there are some areas where the masking does catch us, for example if we tried to subtract the salaries in the example above to find the exact difference we’d always see a result of zero – but if there’s someone who wants to get into your data they will likely find a way.

So let’s take it one step further and see what we can do with a little creativity:

CREATE TABLE #Letters (
	Letter CHAR(1)
);

/* Populate with the alphabet */
INSERT INTO #Letters (Letter)
SELECT TOP 26 CHAR(64 +
	ROW_NUMBER() OVER(ORDER BY (SELECT 1)))
FROM sys.all_objects;

SELECT
	DISTINCT
	ID,
	EmployeeName,
	CONCAT(L1.Letter,
		L2.Letter,
		L3.Letter,
		L4.Letter,
		L5.Letter) [DecodedName]
FROM #Employees e
	LEFT JOIN #Letters L1 ON SUBSTRING(e.EmployeeName, 1, 1) = L1.Letter
	LEFT JOIN #Letters L2 ON SUBSTRING(e.EmployeeName, 2, 1) = L2.Letter
	LEFT JOIN #Letters L3 ON SUBSTRING(e.EmployeeName, 3, 1) = L3.Letter
	LEFT JOIN #Letters L4 ON SUBSTRING(e.EmployeeName, 4, 1) = L4.Letter
	LEFT JOIN #Letters L5 ON SUBSTRING(e.EmployeeName, 5, 1) = L5.Letter;

Heh. So yea, that works too.

Wrap up

When applying Dynamic Data Masking to your data its important to consider who will have access to it and what they may be able to do with that access. If they can write ad-hoc queries against the data then they potentially have the ability to circumvent the masking with some carefully curated queries. As we said at the start this should really only form one part of a wider design and security framework around your data.

Leave a comment