Recently we looked at single threaded concurrency in SQL Server for workloads which only need a single core to execute. We saw how they ran at the same time with the queries effectively being round-robined on the CPU.
This time out I wanted to look at how the concurrency looks when we’re working with multithreaded workloads.
Setting up
As we did last time we’ll be following the same method to adjust the processor affinity as we did previously. In this instance we’ll be changing that affinity as we go to see the impact.
To get our query going parallel we’ll be keeping default settings for parallelism (that’s best practice, I’m sure…) so we’ll get things running over multiple cores easily:

The query we’ll be using for this test is a little different. We’ll be starting with a table to store 10,000 records and our queries will be cross joining two instances together. Both parts can be seen below:
/* Create table for demonstration */
CREATE TABLE dbo.ObjectIDs (
Object1 BIGINT NOT NULL,
Object2 BIGINT NOT NULL,
Object3 BIGINT NOT NULL
);
GO
/* Populate the table with 10,000 records */
INSERT INTO dbo.ObjectIDs (
Object1,
Object2,
Object3
)
SELECT
TOP (10000)
o1.object_id,
o2.object_id,
o3.object_id
FROM
sys.objects o1
CROSS JOIN sys.objects o2
CROSS JOIN sys.objects o3;
GO
/* Query to demonstrate with */
SELECT SUM(x1.Object1 + x1.Object2 + x2.Object1 + x2.Object2)
FROM dbo.ObjectIDs x1
CROSS JOIN dbo.ObjectIDs x2
OPTION (MAXDOP 2);
You may have spotted with the query that we’ll demonstrate with that we’ve specified MAXDOP 2. This is purposeful so we can keep an eye with a small number of cores and I know the query will run within 30 seconds to give a reasonable volume of workload.
Our demonstrations will be focussed on two queries running at the same time, how long it takes for them to execute, what they’re waiting on when executing, and why that’s the case.
Let’s get after it.
Side-by-side execution
We’ll start with a simple side-by-side as a baseline. With 4 cores used to complete the execution two queries will using two cores each is no problem at all:

Now when we look at the wait stats on the execution plan we’re immediately seeing something different to single threaded execution – we’ve got a new wait type appearing:

We still have a few milliseconds of SOS_SCHEDULER_YIELD which we saw before, but now we have 26 seconds of CXPACKET wait. This new wait stat is effectively SQL Server waiting for parallel execution to wrap up – concurrent execution, hence CX.
It’s nothing we need to worry about in this instance, it’s simply a metric we can use to see how long the parallelism was running for. In this case after a few dry runs we’ll take our baseline for execution as around 25 seconds.
Shared execution
For our next example we’ll repeat the same as we did in our single threaded workload where we’ll have both workloads sharing the same resource. In this case that means we’ll run two queries over 2 threads, sharing 2 cores.
Let’s see how that looks when running along with the wait stats once complete:


So we had two queries running on 2 cores and sharing both of them so we’d expect it to take twice as long – and they clocked in at 49 seconds, so that’s as expected. Let’s dive into those wait stats more.
Firstly we see the headline wait stat being CXPACKET again while we were waiting for the parallel execution to complete. Next up we have the SOS_SCHEDULER_YIELD which is interestingly clocking in around the same time as execution. How can we have been waiting on CPU for the entire duration of the query?
Explaining parallel waits
It’s time to jump briefly into parallel wait statistics, I think they deserve their own section.
When we execute queries across multiple threads like our examples here, we can have multiple threads performing different volumes of work. Due to this, wait statistics are collected separately by each thread.
An example of this could be if we want to read data in a table and have some pages in memory but some need to be read from disk. If we then have two threads reading the table, one thread may want more of the pages already in memory so that thread would wait for less time on the disk whereas the other thread would be waiting on the disk for much longer.
With each thread collecting their own statistics it means that the totals can appear to be amplified. In our test we’ve had two 25 second queries run together for a combined 50 seconds. Each query will run for 25 seconds and wait for 25 seconds. Having two threads in each query means that each query waited a combined 50 seconds for the CPU.
When you have an environment which has a lot of queries running in parallel it’s key to understand how these metrics will present themselves. This knowledge will help you better interpret statistics from your environments and how to respond to them.
Wrap up
Here we’ve taken a look at the basics of concurrent parallel execution in SQL Server. We’ve taken examples where all threads have their own cores to work on and compared to what that looks like when we need the threads to share fewer cores.
We’ve also included a look at the different wait type we’ll see with parallel execution and also discussed more broadly how wait times are calculated when dealing with multiple threads.
This isn’t the whole tale, there’s more to tell with this story however. Next time we’ll come back and look at what happens when we’re dealing with an uneven number of cores and how that impacts execution.
One reply on “Multithreaded Concurrency Essentials”
[…] We’ve recently looked at how single threaded concurrency was handled in SQL Server and followed up last time by diving into multithreaded concurrency. […]