Categories
SQL

Introduction to Vertical Partitioning

When dealing with large volumes of data in SQL Server we may need to look for strategies to make querying and managing the data easier to handle.

A few months back we looked at an approach where we partitioned our data into multiple segments by applying partition functions and schemes to our tables. This helps us to manage and maintain the data easier whilst also providing some performance benefits at the same time.

An alternative approach which could be taken with the data is to split it vertically by partitioning the data in a different way which we’ll look at here.

What is it?

When we previously looked at partitioning we considered splitting the data based on values within our table, for example splitting sales data based on the year the sale was made. This is referred to as horizontal partitioning as we can imagine cutting the table horizontally. This approach still keeps the data within one table although it’s stored separately.

By partitioning vertically we can think of splitting the data based on the columns in the table. This would mean moving different columns into different tables to better support our workloads and be easier to maintain. This approach will result in multiple tables storing different sections of our data.

When partitioning horizontally each partition will contain the same columns but different records to other partitions. When we partition vertically each table would contain different columns but each would contain the same records.

What are the benefits?

By choosing to partition a set of data vertically it gives us freedom to restructure the data in a way that is more effective for us to use. Depending on the type of data we’re dealing with there are different ways this may be beneficial.

  • It may be useful to group related information together similar to normalisation. An example of this would be moving customer address details into a separate table to their account details.
  • We could choose to remove data which is used less into it’s own table so we wouldn’t need to retrieve it when querying other data. This could be the date that a customer was added to the system which wouldn’t be used regularly.
  • Another candidate for extracting into its own set of data would be fields taking up a lot of storage. This could include free-type fields such as comments or notes.

We may choose to split a single table down into two, three, or even more for very large tables. How the data should be structured is very individual to the business and how the data will be used.

Restructuring the data into multiple tables can allow us to query the specific sets of data which we want to retrieve without needing to consider the elements which aren’t relevant. This can help with less IO on the storage, a smaller memory footprint, and even smaller memory grants being allocated to queries.

The more data we have which falls into the categories above, the more the benefits of this approach are amplified and can produce considerable results.

How about the drawbacks?

As with most features, it doesn’t come without it’s drawbacks. The key challenge which partitioning vertically can bring is when we’re looking to read or write data after it’s been split into the different tables.

If we have a lot of data access into that table – which is why we may be partitioning in the first place – then this could require a considerable rewrite of queries or stored procedures to update them for the new schema.

It may be possible to utilise elements such as views to make the data appear to still be stored in the previous structure to resolve issues reading the data, however writing data into the new tables will need adjustments to support the changes. Depending on the scope of work or number of references which may need updating this can become prohibitive.

In addition to that, when splitting the data into multiple tables we’ll also be storing a common key across all of them so that they can be linked. This would require additional storage as it’ll be present in every new table created.

So long as we’re already using a relatively small data type for our key such as INT or BIGINT this may not be too much of an issue, particularly if we have such a volume of data where this approach may be beneficial to us.

Finally another area to consider when making these changes is indexing. As the data will be split across multiple tables it may not be possible to recreate some indexes which were previously present. In these cases you may need multiple indexes to achieve the same results which would need to be factored into the design of the new tables.

Wrap up

Here we’ve covered a few areas as an introduction, including what Vertical Partitioning is, the benefits which it can bring, nad and potential drawbacks with this approach.

Horizontal and Vertical partitioning are very different ways to achieve similar benefits. Both of these approaches require a good amount of effort to implement with large data volumes, and both will can improvements in the way we maintain and query our data if done well.

We’ll come back to this again to see an example of implementing Vertical Partitioning and how the process would look with some sample data. We can also take the opportunity when looking at some actual data to see the improvements which this can bring to our maintenance and queries.

One reply on “Introduction to Vertical Partitioning”

Leave a reply to Modifying Data in Multi Table Views – Andy Brownsword Cancel reply