Recently I came across a stubborn blocking issue which led me to discover and understand something interesting about the way Nested Loops join operators work in certain specific situations.

This was a production environment running critical databases on SQL Server 2022 on Azure VMs. The databases ranged between 10 TB – 30 TB across a couple of SQL Server 2022 instances and these databases were using Change Tracking heavily.

There was a massive backlog in the internal CT tables for the change tracking cleanup thread. Earlier, the change tracking cleanup thread had been failing silently for a long period of time and this caused the internal CT tables to bloat heavily. The size of the internal tables was large enough to cause undesirable impact on general workloads and caused some weird performance issues. It was decided to change the isolation level to Snapshot for the sessions reading from CT tables and to turn on trace flags 8286, 8287 on these instances. See more on these trace flags in the Microsoft documentation.

https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-traceon-trace-flags-transact-sql?view=sql-server-ver16

The steps above helped in alleviating the situation at hands and allowed the auto cleanup thread to catch up in the next few weeks. However, a disturbing trend started to show up in the coming weeks.

Every time the auto cleanup thread would wake up, it would lead to massive blocking chains being formed on these respective database servers, and for more than a couple of minutes on certain specific tables, which was reason enough to ring the alarm bells.

A quick Extended Events session was good enough to confirm lock escalation being caused by the auto cleanup thread on the internal change tracking side tables, but why would this happen?

The batch size of the delete statement causing the lock escalation was less than 5000.

As per general understanding and guidance, lock escalation will kick in if a single statement acquires more than about 5000 locks or the lock memory exceeds a certain threshold which would hardly make any sense on these 4 TB RAM boxes.

https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/performance/resolve-blocking-problems-caused-lock-escalation

Also, the XE session clearly indicated the reason for lock escalation was the number of locks being held.

To understand this further, I took the following steps –

I captured the change tracking cleanup events using the Extended Events session below. This session exposed the session id running the auto cleanup thread.

CREATE EVENT SESSION [DBA_TrackChangeTrackingCleanup]
    ON SERVER 
ADD EVENT sqlserver.change_tracking_cleanup
	(
		ACTION(package0.collect_system_time,
        sqlos.task_time,
        sqlserver.client_hostname,
        sqlserver.database_id,
        
				sqlserver.database_name,
        sqlserver.session_id,
        sqlserver.sql_text,
        sqlserver.tsql_stack,
        sqlserver.username)
		WHERE ([database_id]=(7))),
        
ADD EVENT sqlserver.syscommittab_cleanup
	(
		ACTION(package0.collect_system_time,
        sqlos.task_time,
        sqlserver.client_hostname,
        sqlserver.database_id,
        
				sqlserver.database_name,
        sqlserver.session_id,
        sqlserver.sql_text,
        sqlserver.tsql_stack,
        sqlserver.username)
		WHERE ([database_id]=(7)))
ADD TARGET package0.event_file(SET filename=N'FilePath.xel',
        max_file_size=(100),
        max_rollover_files=(20))
WITH (MAX_MEMORY=4096 KB,
        EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
        MAX_DISPATCH_LATENCY=30 SECONDS,
        MAX_EVENT_SIZE=0 KB,
        MEMORY_PARTITION_MODE=NONE,
        TRACK_CAUSALITY=ON,
        STARTUP_STATE=OFF)
GO

Thereafter, I configured the query_post_execution_plan_profile event for the change tracking auto cleanup session and compared the execution plans across multiple servers.

CREATE EVENT SESSION [CT_planprofile]
    ON SERVER
ADD EVENT sqlserver.query_post_execution_plan_profile
	(
		ACTION(sqlos.scheduler_id,sqlserver.database_id,sqlserver.is_system,
				sqlserver.plan_handle,sqlserver.query_hash_signed,sqlserver.query_plan_hash_signed,
				sqlserver.server_instance_name,sqlserver.session_id,
				sqlserver.session_nt_username,sqlserver.sql_text)
		WHERE([sqlserver].[database_id]=(7)
        AND [sqlserver].[session_id] = 408))
ADD TARGET package0.event_file(SET filename=N'FilePath.xel',
        max_file_size=(100),
        max_rollover_files=(20))
WITH (MAX_MEMORY=4096 KB,
        EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
        MAX_DISPATCH_LATENCY=30 SECONDS,
        MAX_EVENT_SIZE=0 KB,
        MEMORY_PARTITION_MODE=NONE,
        TRACK_CAUSALITY=ON,
        STARTUP_STATE=OFF)
GO

I noticed a common thread across the servers where we were facing this blocking.

The execution plan for the auto cleanup thread was the same across and each one of them was running a Nested Loops Join with a Prefetch.

I recalled that I had read about Prefetch in the past and started digging up the internet for more.

I came across some in-depth and comprehensive articles detailing why, and how Nested Loops when runs with Prefetch can lead to such blocking. Essentially SQL Server decides to hold locks for a longer duration in case of a prefetch (otherwise with Read Committed isolation, it tends to release the locks as soon as it has read the row(s)) and that’s what primarily could trigger a lock escalation.

Below are a few well-written articles which explain this concept in detail –

https://sqlperformance.com/2022/10/sql-performance/lock-escalation-threshold-part-3

SQL Server Prefetch and Query Performance

https://techcommunity.microsoft.com/t5/sql-server-blog/read-committed-and-bookmark-lookup/ba-p/383293

https://techcommunity.microsoft.com/t5/sql-server-blog/random-prefetching/ba-p/383485

Eventually, it was decided to turn off the Trace Flags 8286, 8287 and thereafter a change in the execution plans was observed.
No more nested loops with prefetch and no more blocking. Issue fixed.