T-SQL Tuesday #147 – Upgrade Strategies

This month’s invitation from Steve is asking about strategies for upgrades with SQL Server. We might be more familiar with patching a SQL environment semi-regularly – and those processes may be tried and trusted – however its much rarer that we have the opportunity to uplift an entire environment to a later version.

Here’s a few of my thoughts on working through the upgrade process and some of the key areas to consider along that journey. I’ll also include migrations in here too as I think most of us would typically prefer a fresh environment to migrate to rather than attempting an in place upgrade.

Why are we upgrading?

Upgrades aren’t trivial. One of the first things we need to understand is why we’re upgrading.

Do we need to finally move from a legacy version so we can make use of columnstore on a data warehouse? Is there a need to be on a supported version for accreditation? Does a supplier need us to move to a new platform to support their application requirements? Do we need to rebuild the environment to uplift the operating system?

There may be occasions when we need to rationalise an upgrade and and be able to dive into and demonstrate the values of the new features or benefits we’re looking to realise. That said, usually with SQL environments it tends to be “‘if it ain’t broke, don’t fix it” so if there’s a reason to upgrade its usually one we need to face into one way or another.

How will we do that?

The preparation is going to be a large portion of the work to get this project moving. It’ll be individual to every project so I’m just covering a few areas from my own experiences.

As the old saying goes, “Good, fast, cheap. Choose two”. This is also true of upgrades and planning for an upgrade needs to factor this in too, we need to understand where we’ll make trade offs to get the upgrade across the line, we can’t have it all.

Understand the scope – what do you need to move, what services are you hosting, and what are the timescales and other constraints we’re working within
Check your documentation – whatever you have in terms of the environment. Check your server configuration, service accounts firewall rules, etc. If we haven’t got any of that to hand its a great time to start putting all of that together to get everyone on the same page
Identify the dependencies – there will almost certainly be databases reliant on others, applications working alongside specific databases, and reporting tools querying the environment. We need these dependencies understood both as a checklist for the migration but also so that we can effectively test the process and impact on these dependencies
Find the debt – there will likely be some degree of technical debt in there, maybe there’s something which nobody dares touch or something that’s been lurking over the years with the server going through so many development iterations and software releases. Find it and make sure its noted
Pick your battles – between any debt you find along with other elements you want to change or improve whilst you have the change, you’ll need to consider how these factor into the timescales and pick the battles – be those the greatest risks or the lowest hanging fruit. We can’t usually fix everything unfortunately, but we’ll keep those notes for any debt we carry forward because they really should be tackled at some point
Script as much as you can – having scripts or deployment processes which you can automated helps so much for repeatability and consistency so anything you can do to help with this (dbatools.io may be of some use here) can make life so much easier when we get around to the doing. Approach each element of the upgrade focussing on automation wherever possible
Test – and then test it again. What needs to be tested and by who? Is it the data engineers, BI team, operational users? Who has the vested interest in getting this right and how can we get them onboard to to support this critical stage in the process
How will we roll back – clearly we don’t want to roll the upgrade back but its safer to have a plan and not need it than to not have a plan and need one

Ideally we want to document as many of these as we can to share with the folks we’re working with to get this thing moving. We’ll need to articulate the scope of how much we need to do and where the responsibilities sit for the tasks to make sure we can track things effectively.

Its been done before

We’re not doing anything that someone hasn’t tried before (or if we are, maybe there’s a better way). There will be folks out there – such as the ones participating in this invitation – who will have seen and worked through most types of issues before and we’ve got a great community out there willing to share.

Sometimes it doesn’t go so well – I’ve seen OS upgrades destroy clusters before we even got the chance to fire up the SQL services – and sometimes it goes flawlessly – conversely I’ve seen flips between data centres happen within seconds as a result of a memory stick with backups driven across the country coupled with some careful log shipping over a WAN. There are plenty of options whatever the situation that we’ll likely be facing.

If at first you don’t succeed…

At this point we should have collected all of the details we need to make the upgrade happen. This is the time when we take the documentation and plans and put them into action. Upgrade the development environment, pre-production, or build the new environments up and work through all the steps we planned previously.

We should have some sort of a checklist to go at, running deployment scripts or packages, restoring databases and configuring all the services as needed. Once they’re all in place its time to get the testing underway and we’re covering all of those dependencies we identified earlier. You don’t want to be at the start of the next week or month before you find some business critical task which hasn’t worked since the upgrade.

This is usually the most iterative part of the process where we’ll need to go back and make changes to the documentation, deployment scripts etc. to correct for any issues which may show up. Having the ability to quickly roll forwards and back (such as VM snapshots) can be immensely helpful to allow you to replay specific parts of the process if you’re trying to iron out any issues with setup or deployments which have crept in.

Lets do it!

There’s been a huge amount of work to get us to this point with the planning and then running through the test cycles to make sure everything is just right so by now we should be confident in the process so lets get at it!

Upgrades aren’t trivial. But hopefully by this point in the process it’ll be much more trivial than it was.

Its rinse and repeat as we’ve done with the pre-production environment and all being well we’ve got a seamless upgrade on our hands. It doesn’t mean that something won’t come out of the woodwork in the process but that’s why we’ve mapped and tested everything out as thoroughly as we have, so these should be few and far between and if they do occur then hopefully we can react and forward fix. In the worst case, we’ve built up a solid rollback plan so we action that, head back to the testing, and come back to attack it again another day.

Lessons learned

Its usually helpful afterwards – particularly for larger projects – to regroup afterwards and reflect on how things have gone, what went well and what if anything could be improved for next time.

Take the time to loop back around and review the process and where anything might have fallen through the cracks, get the documentation bought up to date, any maybe look at prioritising some of that technical debt we’ve had lying around. Don’t skip this part, it’ll likely flush out some valuable insights.