Solving Bottlenecks in SSIS Packages

Last time out we started to look at optimising SSIS packages by showing how to identify bottlenecks with a handy script. This time we’re turning insights into action to solve those pain points.

The solutions are grouped into 3 areas: Data Flows, as they do a lot of heavy lifting; the Execute SQL task, which can also be used for transformation and calculations; and finally everything else (because the first two are usually the issue).

Data flows

Data flows are a likely culprit for bottlenecks as they can perform large amounts of work. Their performance depends on the source and destination locations, and particularly in how we’re manipulating the data in between.

It can be helpful to change the SSIS logging to ‘Performance’ or a custom level to dig into these. That will populate the Execution Performance report and allow you to identify components in the flow which are causing bottlenecks.

Here are some posts which might help solve your performance issues:

Removing blocking transformations improves flows dramatically
I covered a specific example of removing sort transformations which reduced execution by nearly 50%
Replacing multiple lookups with a cache is more efficient when using the same source
Reduce resources required for a large lookup / cache using a selective caching approach
Right-sizing data flow buffers can help with throughput if there isn’t a single bottleneck
Review data sizes to avoid spilling to disk

If you want to dive into the data flow further, consider debugging options within Visual Studio to step through the flow.

Update: Kevin made a fantastic point that sometimes data flows just aren’t the right place for transformation. Moving a cumbersome flow into a database engine might be the right call.

Execute SQL task

I know we’re looking at SSIS bottlenecks here, but a good portion of packages may be waiting for your SQL engine to complete some work. That doesn’t mean we can’t help things along.

There’s a vast amount of advice online to help with SQL performance issues. At a very broad level, it’ll come down to how the query is written, and what indexes are available to use. They’re not small topics.

Whilst I can’t cover those topics in depth here, below are some posts I think may be particularly helpful:

Review missing index recommendations to consider
Apply covering indexes which are very beneficial in the right conditions
Indexing foreign key columns can be broadly helpful (also be aware of impacts too)
Refactor or use indexed computed columns to solve non-sargable predicates
Check for implicit conversion which could be hiding in procedures
Try using EXISTS instead if the query uses DISTINCT as a crutch
Allowing for multiple execution plans can help if you’re a victim of parameter sniffing

If contention is wider across the environment you’ll want specific tools to help out. Query store is a game changer for quickly finding performance outliers to tackle. There are 3rd party solutions such as the fantastic First Responder Kit (repo) (and Brent’s demo) which can give you a head-start for £free too.

Other tasks

In my experience the Execute SQL task and Data Flows are the usual offenders which will jump out from the analysis. They do a lot of heavy lifting and are vulnerable to resource contention in their own ways.

There’s more to a package than just those though. If the dynamic duo above aren’t the problem there are still things to look out for in your packages:

File operations can be slow depending on hardware or local vs remote disks
External tasks such as interfacing between on-premise and cloud infrastructure
Third-party extensions may cause issues if you have specific need for them and they’re poorly supported or maintained
Database drivers are included above too, especially proprietary ones
Logging configuration can be reviewed to ensure it’s not excessive
Increase MaxConcurrentExecutables slightly if the package would benefit from increased parallelism

Wrap up

In this post we’ve taken the next step to optimise and improve ETL performance in SSIS packages. We’ve built on the great work started last week of identifying performance bottlenecks, and taken it a step further to look at solving those pain points.

Whilst the challenges can be complex and varied, there’s a good variety to get started with here. Begin with blocking transformations for data flows, and low-hanging query and index changes for SQL tasks. They’ll provide the best return on time.

You can chip away at performance issues every day if you’re looking for them. Remember that each fix compounds the benefits, but there will be a point of diminishing returns.

Give them a try and let me know what you think.

2 replies on “Solving Bottlenecks in SSIS Packages”

[…] Andy Brownsword has some advice: […]

[…] background in SSIS featured heavily – from identifying bottlenecks, knowing where to look for solutions, to solving blocking, and optimisations including buffer sizes, caching, more caching, and […]

Data flows

Execute SQL task

Other tasks

Wrap up

Share this:

Related

2 replies on “Solving Bottlenecks in SSIS Packages”

Leave a comment Cancel reply