Categories
SQL

More Caring, More Sharing: Handling Datasets within Sessions

When building data solutions we’ll be handling sets of data which may need to be imported, transformed, aggregated or exported amongst other things. With more complex solutions can come a need to share this data between different areas of the process.

Last week we looked at being able to share individual values within a session and options for that. This week we’re scaling that up to whole datasets and what we may be able to leverage with the increased volume of data.

Solving with temporary tables

When writing our code we may look at using variables to store our data. Whilst this may be sufficient within a single batch this doesn’t work across multiple batches within a session.

We’re looking at datasets here so below is an example with a table variable and the syntax is even being highlighted before we execute to indicate it’s not looking promising:

Table variables don't work across multiple batches

This is one of the more straightforward limitations we may come up against. We aren’t able to share variables (this is also the same for regular value type variables too) with subsequent batches as their scope is limited to the batch they’re created in. We can however move our data into a temporary (temp) table which will persist across a batch, for example:

CREATE TABLE #MyVars (ID INT, Val INT);
INSERT INTO #MyVars VALUES (1, 1);
SELECT * FROM #MyVars;
GO

UPDATE #MyVars SET Val = 2 WHERE ID = 1;
SELECT * FROM #MyVars;
GO

Outside of using a # instead of an @ and a slight change to the table definition there isn’t a great deal of difference in how these are scripted so it’s usually a pretty simple swap out.

Solving with nested procedures

So we can use temp tables to share data between sessions, so now let’s consider that in the context of stored procedures. Below are a couple of procedures along with calls to them which can create the temp table and try to use it in the following procedure:

CREATE OR ALTER PROC dbo.Step1 AS
BEGIN
	CREATE TABLE #TempTable (
		ID INT IDENTITY(1, 1),
		SomeNum INT NOT NULL,
		SomeText VARCHAR(50) NOT NULL
	);

	INSERT INTO #TempTable
	VALUES (1, 'One'), (2, 'Two');
END
GO

CREATE OR ALTER PROC dbo.Step2 AS
BEGIN
	SELECT * FROM #TempTable;
END
GO

EXEC dbo.Step1;
EXEC dbo.Step2;
GO

Msg 208, Level 16, State 0, Procedure dbo.Step2, Line 4 [Batch Start Line 51]
Invalid object name ‘#TempTable’.

When we try to use temp tables in this context it doesn’t remain between the two procedure calls. When the first procedure completes the temp table goes out of scope and is removed before the second procedure is executed.

A way to alleviate this is to nest the stored procedures so that dbo.Step1 calls dbo.Step2 within it’s scope, which keeps the table available to the second procedure.

The modified code for this approach would look as follows:

CREATE OR ALTER PROC dbo.Step1 AS
BEGIN
	CREATE TABLE #TempTable (
		ID INT IDENTITY(1, 1),
		SomeNum INT NOT NULL,
		SomeText VARCHAR(50) NOT NULL
	);

	INSERT INTO #TempTable
	VALUES (1, 'One'), (2, 'Two');

	/* Add the nested call */
	EXEC dbo.Step2;
END
GO

CREATE OR ALTER PROC dbo.Step2 AS
BEGIN
	SELECT * FROM #TempTable;
END
GO

EXEC dbo.Step1;

Whilst this approach allows us to share the table of data across the procedures, it can be tough to troubleshoot these procedures as dbo.Step2 can’t be called without first setting up a temp table in the correct state.

Solving with global temp tables

We can take the temp table approach even further by using global temp tables. Whilst temp tables are a common sight, the use of global ones is less frequent as they can be accessed by multiple sessions. These are declared by prefixing the table name with ## rather than the single # of a temp table.

The scope for these tables is until the calling session is closed and any other sessions have finished with the table. Due to this we can solve the above procedure issue using these instead, for example:

CREATE OR ALTER PROC dbo.Step1 AS
BEGIN
	CREATE TABLE ##TempTable (
		ID INT IDENTITY(1, 1),
		SomeNum INT NOT NULL,
		SomeText VARCHAR(50) NOT NULL
	);

	INSERT INTO ##TempTable
	VALUES (1, 'One'), (2, 'Two');
END
GO

CREATE OR ALTER PROC dbo.Step2 AS
BEGIN
	SELECT * FROM ##TempTable;
END
GO

EXEC dbo.Step1;
EXEC dbo.Step2;

As a global temporary table exists across sessions, the design for processes which use it would need to be thought of carefully to avoid interference from other processes which may be running. An example could be a second instance of the same process which may come along expecting to create a new copy of the table which would fail as it already exists.

Intricate (and convoluted) processes which make heavy use of these tables can get messy when they go wrong. Planning for global temp tables as opposed to regular ones adds more layers of complexity.

Similar to nested procedures this approach can again be hard to troubleshoot as the table and associated data may need to be created and manually put into a desired state before the task can be debugged.

Solving with a staging table

If we’re at the point where we want to exchange a good amount of data across a number of batches, sessions or procedures, then it may be worth looking at a staging table to hold this data.

This approach would involve us creating a table which would persist in our database either for the duration of our process, or potentially reside in the database permanently if it’s a job which runs frequently enough.

Using this approach will allow any code to be decoupled from how the data is shared – you wouldn’t need to consider what procedures are nested, or make sure temp tables are created and dropped appropriately.

By having a table within our database which persists our data this can allow us to also better troubleshoot failures as the state will remain even after failure. Additionally in the event we need to replay the data over time we may able to achieve that through restoration of database backups and transaction logs.

I don’t think there’s much need for showing how to create a table to add records into and retrieve them. We get the idea.

Wrap up

Here we’ve take a look at a few different approaches of how we can share volumes of data within a session. We’ve looked at various temporary methods as well as the persisted option with a staging table.

The temporary methods are reliant on crafting our code in specific ways to ensure the object stays in scope and can always be accessed. In instances where this code has evolved over time these solutions can become very complex.

Whilst I’d use temporary tables within a single batch or procedure, if I needed a solution which spanned multiple batches or procedures I’d personally lean towards a staging table. This would remove the reliance each piece of code would have on preceding ones and decouple the overall solution to make it more manageable.

If you’d like to see the previous post where we looked at parameterisation, using Context Info, and Session Context you can find the post here.

How about folks out there, have you had a need to share volumes of data within a session like this or have you steered away from this pattern in favour of something else?

Leave a comment