Last time out we started to look at optimising SSIS packages by showing how to identify bottlenecks with a handy script. This time we’re turning insights into action to solve those pain points.
The solutions are grouped into 3 areas: Data Flows, as they do a lot of heavy lifting; the Execute SQL task, which can also be used for transformation and calculations; and finally everything else (because the first two are usually the issue).
Data flows
Data flows are a likely culprit for bottlenecks as they can perform large amounts of work. Their performance depends on the source and destination locations, and particularly in how we’re manipulating the data in between.
It can be helpful to change the SSIS logging to ‘Performance’ or a custom level to dig into these. That will populate the Execution Performance report and allow you to identify components in the flow which are causing bottlenecks.
Here are some posts which might help solve your performance issues:
- Removing blocking transformations improves flows dramatically
- I covered a specific example of removing sort transformations which reduced execution by nearly 50%
- Replacing multiple lookups with a cache is more efficient when using the same source
- Reduce resources required for a large lookup / cache using a selective caching approach
- Right-sizing data flow buffers can help with throughput if there isn’t a single bottleneck
- Review data sizes to avoid spilling to disk
If you want to dive into the data flow further, consider debugging options within Visual Studio to step through the flow.
Update: Kevin made a fantastic point that sometimes data flows just aren’t the right place for transformation. Moving a cumbersome flow into a database engine might be the right call.
Execute SQL task
I know we’re looking at SSIS bottlenecks here, but a good portion of packages may be waiting for your SQL engine to complete some work. That doesn’t mean we can’t help things along.
There’s a vast amount of advice online to help with SQL performance issues. At a very broad level, it’ll come down to how the query is written, and what indexes are available to use. They’re not small topics.
Whilst I can’t cover those topics in depth here, below are some posts I think may be particularly helpful:
- Review missing index recommendations to consider
- Apply covering indexes which are very beneficial in the right conditions
- Indexing foreign key columns can be broadly helpful (also be aware of impacts too)
- Refactor or use indexed computed columns to solve non-sargable predicates
- Check for implicit conversion which could be hiding in procedures
- Try using
EXISTSinstead if the query usesDISTINCTas a crutch - Allowing for multiple execution plans can help if you’re a victim of parameter sniffing
If contention is wider across the environment you’ll want specific tools to help out. Query store is a game changer for quickly finding performance outliers to tackle. There are 3rd party solutions such as the fantastic First Responder Kit (repo) (and Brent’s demo) which can give you a head-start for ÂŁfree too.
Other tasks
In my experience the Execute SQL task and Data Flows are the usual offenders which will jump out from the analysis. They do a lot of heavy lifting and are vulnerable to resource contention in their own ways.
There’s more to a package than just those though. If the dynamic duo above aren’t the problem there are still things to look out for in your packages:
- File operations can be slow depending on hardware or local vs remote disks
- External tasks such as interfacing between on-premise and cloud infrastructure
- Third-party extensions may cause issues if you have specific need for them and they’re poorly supported or maintained
- Database drivers are included above too, especially proprietary ones
- Logging configuration can be reviewed to ensure it’s not excessive
- Increase
MaxConcurrentExecutablesslightly if the package would benefit from increased parallelism
Wrap up
In this post we’ve taken the next step to optimise and improve ETL performance in SSIS packages. We’ve built on the great work started last week of identifying performance bottlenecks, and taken it a step further to look at solving those pain points.
Whilst the challenges can be complex and varied, there’s a good variety to get started with here. Begin with blocking transformations for data flows, and low-hanging query and index changes for SQL tasks. They’ll provide the best return on time.
You can chip away at performance issues every day if you’re looking for them. Remember that each fix compounds the benefits, but there will be a point of diminishing returns.
Give them a try and let me know what you think.
One reply on “Solving Bottlenecks in SSIS Packages”
[…] Andy Brownsword has some advice: […]