Categories
SQL

Deeper into Concurrency Contention

We’ve recently looked at how single threaded concurrency was handled in SQL Server and followed up last time by diving into multithreaded concurrency.

This time we’re going to go a bit further with the multithreading to look more into how the contention is handled when the number of cores available doesn’t line up with our workload. We’ll also have a brief look at the additional challenges which can rear their heads when dealing with parallel workloads.

Setting up

We’ll continue with the same example we had previously for testing the parallel execution. Our CTFP will be left at the default 50 and MAXDOP set at 0 at the server level.

The SQL to create our test data and demonstrate our scenarios is below if you wanted to try this out for yourself:

/* Create table for demonstration */
CREATE TABLE dbo.ObjectIDs (
    Object1 BIGINT NOT NULL,
    Object2 BIGINT NOT NULL,
    Object3 BIGINT NOT NULL
);
GO
 
/* Populate the table with 10,000 records */
INSERT INTO dbo.ObjectIDs (
    Object1,
    Object2,
    Object3
)
SELECT
    TOP (10000)
    o1.object_id,
    o2.object_id,
    o3.object_id
FROM
    sys.objects o1
    CROSS JOIN sys.objects o2
    CROSS JOIN sys.objects o3;
GO
 
/* Query to demonstrate with */
SELECT SUM(x1.Object1 + x1.Object2 + x2.Object1 + x2.Object2)
FROM dbo.ObjectIDs x1
    CROSS JOIN dbo.ObjectIDs x2
OPTION (MAXDOP 2);

In our previous multithreaded examples we two tasks operating on two parallel threads, and we ran these together with 2 and 4 cores in use. Let’s see how this situation works with an odd number of cores: 3.

CPU swinging

Running our example and utilising only 3 cores means that we’ll have to share between both executions at least a little.

Recapping what we saw previously:

  • Utilising 4 cores to run 2 queries with 2 threads meant each could have its own core and we didn’t need to wait for CPU to become available
  • Using 4 cores to run the same 2 queries – each still using 2 threads – meant they had to share the cores and took twice as long to run each query as a result

With no contention we saw our queries execute in around 25 seconds so that’s our baseline to work from here too.

So now we’ll try the same again with 3 cores in use. How long will the queries take to execute now?

CPU when sharing 3 cores between 2 queries

Any ideas? Let’s go back to the wait stats and break them down:

Wait stats when sharing 3 cores across 2 queries using 2 threads

So we’ve got around 30 seconds of runtime. How come?

It isn’t the case that 2 queries are slicing equal time on the CPU to execute, we’ve got 4 threads which are receiving an equal share of time. For example threads 1, 2 and 3 may use the CPU, followed by threads 4, 1 and 2, then followed by 3, 4 and 1, etc.

We know previously we’ve seen 2 queries using 2 threads across 4 cores takes around 25 seconds. So 25 seconds x 2 threads x 2 queries = ~100 seconds of work for both queries. If we were to spread that load evenly over 3 cores we’d get to just over 30 seconds which we have here.

Based on the wait stats we can see the wait for CPU (SOS_SCHEDULER_YIELD) is reporting 12 seconds. We saw last week this is the total for both of the cores, so would be 6 seconds added to the query which is pretty much the difference between our baseline 25 seconds and our observed 30 seconds here.

We need to be thinking in terms of threads and not queries.

There’s a misconception that the number of cores to be used for a query needs to be locked before the process can execute. If this was the case we’d see only 2 of the cores being able to be used at any time due to each query needing to isolate 2 cores.

That wraps up our examples for now but there are some caveats I wanted to cover around parallel execution.

More threads, more problems

When we’re dealing with parallel execution we don’t just have more processing threads consuming CPU that need to be dealt with.

Last time out I said that the CXPACKET wait type was SQL Server waiting for parallel execution to complete. But the question is, what was waiting?

Well, it was another thread. Thread 0 to be specific.

If we take a look at our actual execution plan and start digging into the parallel operators we’ll see Thread 0 being listed but not doing an awful lot. Below are some results from the table scan operator in our example:

We can see Thread 0 reporting no reads, no CPU time, and now rows being returned.

Thread 0 is a synchronisation thread that is used by SQL Server to coordinate the threads which are actually doing the work. This is what is waiting for parallel execution to complete and causes our CXPACKET wait type.

This is where it gets a little tricky when it comes to testing parallelism. SQL Server likes to keep cores free for for the synchronisation to operate on so it can coordinate those worker threads effectively.

You see, my example queries here (and the previous post) weren’t completely honest.

If you tried to set up a scenario above by only assigning 3 cores to the engine then what you’ll likely find is that both queries will execute on the same two cores and the 3rd core will be left idle with the synchronisation thread utilising it.

Here are how the threads were allocated on my environment in most cases (your mileage may vary):

  • 2 cores: queries shared 2 cores for execution
  • 3 cores: queries shared 2 cores, 1 was unused
  • 4 cores: queries shared 2 cores, 2 were unused
  • 5 cores: queries shared 3 cores, 2 were unused
  • 6 cores: queries consumed 4 cores, 2 were unused

To make this more interesting I saw different cores being used after restarting the server and sometimes the queries would use more or less cores – sometimes 4 cores would use 3 cores to execute and only 1 remained unused.

For this reason, the examples you’ve seen in this series of posts have actually been designed based on the number of cores being used for processing rather than the number of cores allocated to the engine.

There are smarter folks out there who may be able to explain why and how the cores are managed. What I do know is that at scale when you’ve got production workloads, the environment will balance itself out much better and this isn’t really an issue, but it’s handy to understand what’s happening under the hood a little.

Wrap up

Here we’ve extended our demonstration from last week and looked at how an uneven number of cores (relative to our query) handles concurrent parallel execution. The takeaway here is that we aren’t balancing queries across our CPU cores, it’s a balance of threads.

We’ve also touched on the synchronisation thread and how that adds to the total number of threads which the engine is managing and so can interfere with the thread to core assignments.

I hope you’ve found these posts looking at concurrency in SQL Server to be useful. This is one of the foundations for query execution and is a starting point for understanding where some of our bottlenecks and wait stats originate from and what we may want to consider to improve those situations.

If you’ve missed either of the other posts around concurrency then you can find those below:

Leave a comment