As Deborah’s invitation opens this month she quite rightly points out that there are a lot of professionals out there who have a few (passionate) opinions about what they do or the way they do it.
This month we’re invited to … well, have a rant – a rant about those experiences which have shaped our perspectives for those areas where it should depend.
So, Linked Servers. Yep they’re still a thing, they can still solve a problem, so should we use them? No, please don’t.
Let’s take a look at a few of the reasons why I’m not a fan of linked servers:
Logical separation
When you’ve got data physically separate in different environments there’s usually a reason for it. Yes it may very well relate, but there’s a reason the data isn’t already in one place. This should be a red flag.
We’ve seen the optimizer make some interesting choices when we’re dealing with tightly coupled data in the same database so imagine how it might react when we try to join data across different servers?!
As an example, if we’re trying to filter a remote dataset based on a join and a filter from a table in our local environment what should the engine do? – would you want to send a huge list of key values to the remote server to scan for, or would you pull the entire remote dataset to the local environment before it’s filtered?
Of course the answer is it depends, and it only takes the wrong choice in a couple of instances to start causing some real performance headaches.
Another point to make here is that if in fact you do retrieve a large amount of data over the network to process locally then you’ll also find that it isn’t cached like it would be with a local index, so every execution will need to repeatedly read and transfer the data from the remote environment.
Can we use statistics?
When we’re considering about which approach to use when filtering data above we may consider what impact statistics would have on that decision. Would that decision be clearer if we could see the statistics for that remote table?
Yet again the answer is it depends!
Whilst most of us may be fortunate enough to be working on more modern versions of SQL Server, spare a thought for the folks working on pre-2012 environments and trying to use linked servers.
If you’re on these older versions of SQL Server then you’ll find that the accounts used for your linked servers won’t have access to table statistics by default. They’d need to be granted elevated permissions to the database such being members of either the db_ddladmin or db_owner roles – which you certainly wouldn’t want for a reporting account.
Without the visibility of these statistics you’re tying one of the engine’s hands behind it’s back when you ask it to build an execution plan. Whilst this wouldn’t be immediately visible when trying to run a query you may see the impact of it quite fast.
It’s a bit obscure, in part due to it only impacting legacy versions of the database engine, but you can bet there are folks out there still using the feature and blissfully unaware of the potential impact.
A bad smell
The final point which pops into my mind when I think about a linked server is that its usually developers using these as a crutch in their code.
You might also find they think an entire task should sit in a single stored procedure regardless of how lengthy it may be, or all of the work should be in a single statement using a copious number of CTEs and obscure cross-database joins.
Yea those things get at me too – albeit not inherently bad – but they can be warning signs when you see them used excessively. Using linked servers is like a bad smell that can indicate bad habits creeping in.
What can we do about it?
If you want to move data around between environments and you’re considering using linked servers then there are a few better options available for you:
- Look into data replication technologies for propagating data across environments and ensuring they remain consistent regardless of what process may be consuming it
- Try out ETL tooling (such as SSIS) to help you move data around and separate the movement of data from the processing of it
- If all else fails why not try using
OPENQUERYwith a linked server and have more control over the direction of travel for the data, pulling it locally before working on it
These types of options can provide you with more effective control and logical separation through your workflows so it might be worth trying out one or two of these and see how you get on in your use case – you might be pleasantly surprised by the performance difference.
Wrap up
As we mentioned at the start, the key thing here is that it depends on circumstances. Are my opinions above true in all cases, certainly not – but if someone is asking the question about linked servers then it’s likely that some of these will be true and may not have been considered which is why I take the view which I do.
These opinions come from experience with products or technologies so whilst it’s not always wise to have preconceived ideas, it can be worthwhile listening to those with the passionate opinions as there will typically be a good reason for it. Plus, if you disagree you can go try it out and build your own passionate opinions in the process.
Whilst we have the freedom to rant about things where ‘it depends’ it’s worth mentioning NOLOCK which is quite a straight forward one – just don’t. Please don’t ask and please don’t do it. Thank you kindly.