Categories
SSIS

Beware Blocking Transformations in SSIS Data Flows

We have a platter of transformations to use when constructing data flows in SSIS packages. Not all transformations are equal though, and some can catch us out with performance impact as data volumes scale.

In this post we’ll look at blocking transformations which can trip us up if we’re not careful with them (or avoid them).

What are blocking transformations?

A blocking transformation refers to how certain transformations handle the data which passes through them.

A regular transformation can handle batches of records, perform their work, and pass their batch on to the next element in the data flow. They do this concurrently, for example a Derived Column transformation can evaluate the expression for each row, and the flow can continue as another batch is processed.

Blocking transformations however can’t handle one batch at a time. They require the entire dataset to complete their work, and so the flow is blocked. An example of this would be the Sort transformation where the output must be in order. Until it can evaluate all of the data it can’t be certain that the whole set is sorted.

Let’s look at the impact of this on our packages.

The impact of blocking transformations

Blocking transformations can present issues in a number of ways. Here are the most likely areas where you’ll spot their impact:

  • Performance degradation in the data flows. Execution time for the data flow can increase dramatically when a blocking transformation is introduced, and compounded if multiple are in place
  • Increased resource usage, particularly in terms of memory. Needing to handle the entire data set at once can lead to memory pressure and in extreme cases paging to disk which will degrade performance further
  • Scaling becomes increasingly difficult as data sets grow as a larger set of data requires appropriately larger resources to handle. Once limits are close to or reached, performance can fall off a cliff with little warning

Blocking transformations will cause extended runtime or and in extreme cases failure of packages due to the issues highlighted above. As mentioned, scaling becomes a challenge and we can find unexpected behaviour when executing packages as limits start to be reached.

So, what can we do to help remedy the blocking?

Dealing with blocking transformations

Removing blocking transformations can be challenging as they’ll be serving a specific purpose within the flow. Let’s start out by looking at some of the common blocking transformations:

  • The Sort transformation requires all rows before it can effectively sort the data
  • An Aggregate transformation need to process all records before it can output the aggregate values
  • A Merge Join transformation requires all of the data to perform the join, and as a bonus it also requires both sets of data to be sorted prior to processing

The situations where these transformations are needed can vary, and so can the solutions. There are a couple of broad options which should be considered to help tackle the impact of these:

Moving the work elsewhere can remove the need for using these transformations. For example if you need ordered data, instead of using a Sort transformation, the source query could be changed to perform the sorting in the database.

Breaking the work up can remove the need to handle all the data at once. For example if using an Aggregate to group sales by customer, a Loop Container could iterate over each customer and the data flow could aggregate them individually to reduce the amount of data needed at any one time.

We’ll come back to this subject and look at a specific example in a future post, where we can work through it in more detail.

Wrap up

In this post we’ve looked at transformations which can block our data flows and cause degradation or even failure of packages. We’ve looked at the impacts which this can cause – in case you’ve seen anything similar yourself – and touched on two key ways we can look at tackling the impacts on our package.

Blocking transformations can be particularly problematic as their impact is proportional to the data volume. This can mean disparities in performance between development and production environments. It can also present as slowly degraded performance of a package over time as data sets grow.

By refactoring how we do the work, we can remove these looming issues. Through carefully refactoring the flow and/or package, we’re able to produce more performant and consist results for our processes.

As mentioned above, we’ll revisit blocking transformations and look at a specific example of how they might be used and how we can make changes to avoid the impacts we’ve discussed here.

4 replies on “Beware Blocking Transformations in SSIS Data Flows”

Leave a reply to Tuning Data Flow Buffer Sizes to Improve Performance in SSIS – Andy Brownsword Cancel reply