Categories
SSIS

Tuning Data Flow Buffer Sizes to Improve Performance in SSIS

When using data flows in SSIS packages we want the flow to be fast and fluid. Improving performance of the flows will vary in different packages, but one element can consistently help – tuning memory utilisation.

In this post we’ll look at tuning the memory utilisation by altering the number of rows passing through the flow concurrently. Specifically we’re looking at the following properties:

  • DefaultBufferSize
  • DefaultBufferMaxRows

How buffers work

When data passes through the flow it does so in a buffer which contains a number of records. It uses the DefaultBufferSize and DefaultBufferMaxRows to determine how many records that is.

Both of these variables set a limit for the amount of data which can fit into a buffer:

  • DefaultBufferSize sets the memory limit for records in a single buffer
  • DefaultBufferMaxRows sets the maximum number of rows in a single buffer.

The number of records to be put into a buffer will be the largest amount without exceeding either of these two limits.

The default values for these settings are:

  • DefaultBufferSize of 10MB memory for the buffer
  • DefaultBufferMaxRows of 10,000 rows for the buffer

The one of these which is most restrictive to the data will throttle the number of records in the buffer. For example if a record was 100 bytes, although 100,000 records may fit into a 10MB buffer, the max rows of 10,000 would be used as the limitation.

Adjusting these settings provides opportunities for improvement.

Tuning tips

Each data flow will have its own ‘sweet spot’ for these values. The key variable here is the data size of your records.

  • A record with few columns or small data types may benefit from an increased DefaultMaxBufferRows to allow more rows into the buffer
  • A record with many columns or large data types may need an increase to the DefaultBufferSize to provide more memory for additional rows

If we these properties are set too large or small they can impact performance.

The first step for optimising is to calculate the rough size for your records. You can inspect the data flow and review the data types in the metadata to do this:

Metadata for a data flow showing data types for each field

Based on the data types shown in here we can get a rough calculation of the memory required per row. Below are a list of some more common data types and their sizes

  • DT_I4 – 4 byte integer
  • DT_I8 – 8 byte integer (BIGINT)
  • DT_DATE – 8 bytes
  • DT_DBDATE – 4 bytes
  • DT_DBTIMESTAMP – 8 bytes
  • DT_STR – Length x1 byte
  • DT_WSTR – Length x2 bytes as its Unicode
  • DT_DECIMAL – 12 bytes
  • DT_NUMERIC – 16 bytes

Once we know the length of each record we can understand which of the two properties are limiting the throughput. Divide 10,000,000 by the number of bytes in your record. Assuming you’re using default values:

  • If the result is less than 10,000 then the DefaultBufferSize is limiting
  • If the result is greater than 10,000 then the DefaultBufferMaxRows is limiting

So what should the limits be set to?

Testing the limits

I want to demonstrate the difference and impact that adjusting these values can make. The data I’ll be using for testing has a row size just under 1KB so the default buffer limits of 10MB and 10,000 records is a good balance.

I’ll vary both values and take an average timing for a batch of circa 9mil records through a data flow. The DefaultBufferSize will vary the buffer memory to 1MB, 5MB, 10MB, 25MB, 50MB, and 100MB. The DefaultBufferMaxRows will vary the record limit from 1000, 5000, 10000, 25000, 50000, and 100000.

The chart below shows the time taken to run the data flow with the permutations above. The horizontal axis shows the batch size, the vertical axis the time taken for the data flow (seconds), and there are separate series for each memory setting.

From this analysis we can determine the following:

  • A very low number of records in DefaultBufferMaxRows will consistently perform slowly, based on the example limit of 1,000
  • A very small buffer in DefaultBufferSize will consistently perform badly as shown with the 1MB buffer
  • Performance with large settings together can lead to a large degradation in performance, based on 50k and 100k batches with 50MB and 100MB buffers respectively
  • Smaller increases in settings are more likely to yield performance benefits than drastic changes
  • When DefaultBufferSize and DefaultBufferMaxRows are in near balance but with larger values there can be degraded performance spikes, as shown with 50k rows and a 50MB buffer, which subsides with a 100MB buffer

When looking at this example as a whole, the default settings are actually very competitive overall. Oftentimes with smaller data sets or simpler data flows there’s likely no need to adjust these.

However the best performance across all the samples (25k batches with a 50MB buffer) was over 20% faster than the default settings.

When you have more complex data flows or are moving a considerable volume of records, it really can pay to experiment with these settings.

Wrap up

In this post we’ve looked at how the DefaultBufferSize and DefaultBufferMaxRows properties of a data flow can be adjusted to improve performance without needing to change the contents of the flow.

Blindly changing these settings or making considerable changes can lead to performance degradation. Here’s how to make measured changes to find the right limits:

  • 🧮 Calculate the row size for your data flow
  • 📏 Determine which of the settings is the current limiting factor
  • 🪜 Incrementally move that towards an optimal setting, whilst monitoring performance
  • 💡 Note it’s beneficial to keep the memory slightly higher than needed due to the final observation above
  • 📈 Once a balance is achieved, then start incremental increases in both values and monitor until benefits in performance plateau

Just because a data flow is performing slowly, these values aren’t always the immediate cause. There can be other elements to consider such as blocking transformations.

Its important to make sure we’re fixing the correct problem.

3 replies on “Tuning Data Flow Buffer Sizes to Improve Performance in SSIS”

Leave a reply to Tuning SSIS Data Flow Buffers – Curated SQL Cancel reply