Thursday, November 30, 2017

Hey! What's the deal with SQL Server NOCOUNT and T-SQL WHILE loops?

(This testing was performed on SQL Server 2016 SP1 CU0.)
OK. I like logging a lot of information for my tests.

So I created two-three logging tables for perf stat captures :-)


DROP TABLE IF EXISTS count_reg_2;
CREATE TABLE count_reg_2
(test_start     DATETIME,
 session_id     INT,
 login_time     DATETIME,
 scheduler_id   INT,
 cpu_id         INT,
 parent_node_id INT,
 memory_node_id INT,
 elapsed_ms     INT,
 iter           BIGINT,
 cpu_ms         BIGINT,
 session_end    DATETIME)
WITH (data_compression = page);

DROP TABLE IF EXISTS count_session_waits_2;
CREATE TABLE count_session_waits_2
(test_start          DATETIME,
 session_id          INT,
 wait_type           NVARCHAR(256),
 waiting_tasks_count BIGINT,
 wait_time_ms        BIGINT,
 max_wait_time_ms    BIGINT,
 signal_wait_time_ms BIGINT)
WITH (data_compression = page);

DROP TABLE IF EXISTS count_system_waits_2;
CREATE TABLE count_system_waits_2
(capture_tm          DATETIME,
 wait_type           NVARCHAR(256),
 waiting_tasks_count BIGINT,
 wait_time_ms        BIGINT,
 max_wait_time_ms    BIGINT,
 signal_wait_time_ms BIGINT)
WITH (data_compression=page);

And I've got a stored procedure for my test workload. This is after a few revisions - there's a couple of important comments in there that maybe someday I'll be able to explain more adequately :-) Why count to 8000000 in the loop by default? It was an early estimate for 10 seconds of solid CPU work on this system, and I wasn't motivated enough to bump it higher once I saw that it fell a little short once NOCOUNT was ON. But I don't want to spill the beans early.

So... here's the stored procedure.


CREATE OR ALTER PROCEDURE timed_count__reg_2 @maxcount INT = 8000000, @test_start DATETIME, @NOCOUNT VARCHAR(10) = 'OFF', @syswaits VARCHAR(10) = 'OFF'
AS
DECLARE @start_time DATETIME,
        @end_time   DATETIME,    
        @iter       INT = 0;

WAITFOR DELAY '00:00:01'; -- session waits don't always flush for this test!?! more reliably flush with a WAITFOR

IF @NOCOUNT = 'ON' SET NOCOUNT ON;

SET @start_time = GETDATE();
IF @syswaits = 'ON'
INSERT INTO count_system_waits_2
SELECT @start_time, ws.*
FROM sys.dm_os_wait_stats ws
WHERE ws.waiting_tasks_count > 0;

WHILE @iter < @maxcount SET @iter = @iter + 1;

SET @end_time = GETDATE();
IF @syswaits = 'ON'
INSERT INTO count_system_waits_2
SELECT @end_time, ws.*
FROM sys.dm_os_wait_stats ws
WHERE ws.waiting_tasks_count > 0
OPTION (MAXDOP 1);

INSERT INTO count_reg_2
SELECT @test_start, @@spid, sess.login_time, req.scheduler_id, sch.cpu_id, sch.parent_node_id, nde.memory_node_id, 
       DATEDIFF(ms, @start_time, @end_time), @iter, sess.cpu, @end_time
FROM   sys.dm_exec_requests req
JOIN   sys.dm_os_schedulers sch  ON req.scheduler_id = sch.scheduler_id 
JOIN   sys.dm_os_nodes      nde  ON sch.parent_node_id = nde.node_id
JOIN   sys.sysprocesses     sess ON req.session_id = sess.spid  --hey!! important to use sysprocesses here rather than exec_sessions, better reporting of cpu time
WHERE  req.session_id = @@spid
OPTION (MAXDOP 1);

INSERT INTO count_session_waits_2
SELECT @test_start, @@spid,  sws.wait_type, sws.waiting_tasks_count,  sws.wait_time_ms,
       sws.max_wait_time_ms, sws.signal_wait_time_ms
FROM   sys.dm_exec_session_wait_stats sws
WHERE  sws.session_id = @@spid
OPTION (MAXDOP 1);
--sql_sasquatch

OK.  Now Aaron Bertrand has a post from February 2016...
Performance Surprises and Assumptions : SET NOCOUNT ON
https://sqlperformance.com/2016/02/t-sql-queries/nocount

In that blog post the potential performance benefit of NOCOUNT ON was elusive.  And, to be honest, this round of my tests was NOT to learn about NOCOUNT but other stuff.  Just happened to stumble on NOCOUNT when I was sifting through stuff after some early testing.  But with these results in hand, maybe NOCOUNT has a few more performance surprise for all of us :-)

OK, still need a driver for the tests.  I'm old as the dinosaurs so I usually throw together a .bat script before Powershell.

I put together a pair of .bat scripts, one for NOCOUNT OFF (default) and one for NOCOUNT ON. These scripts resided on my laptop, and will be executed from my laptop, with a remote connection to SQL Server via sqlcmd. They looked like this.

nocount_off__test.bat
set date1=%date:~4%
set date2=%date1:/=-%
set datetime='%date2% %time%'
START /B sqlcmd -dSASQUATCH -SMountain_home -Q "EXEC timed_count__reg_2 8000000, %datetime%, 'OFF', 'ON';" > nul

nocount_on__test.bat
set date1=%date:~4%
set date2=%date1:/=-%
set datetime='%date2% %time%'
START /B sqlcmd -dSASQUATCH -SMountain_home -Q "EXEC timed_count__reg_2 8000000, %datetime%, 'ON', 'ON';" > nul

The timed_count__reg_2 stored procedure does a fair bit of self-logging, but whenever possible I like to keep eyes-on from outside the individual test sessions as well.  Plus, I really like perfmon.

So, on the SQL Server RDBMS VM I have a Sasquatch directory.  Its got a .bat script for starting logman and a text file for the counters to log.  And a .bat script to stop the counter collection. For this type of test, I typically use a one second collection interval.



Fire up the Sasquatch collector.  First executed the script with NOCOUNT OFF.  Then executed the script with NOCOUNT ON.  Stopped logman.  Let's see what I've got.



I'm glad that worked out.  The first test was with NOCOUNT OFF.  It ran about 50 seconds.  The second test was with NOCOUNT ON.  It ran really hot on the CPU, in less than 10 seconds. There's something goofy about the CPU vs elapsed numbers for that particular run... that'll have to be for another day.

Notice the difference in the first test between the end ASYNC_NETWORK_IO waiting_tasks_count and the beginning count: 194332 - 186696 = 7636

That's exactly the ASYNC_NETWORK_IO waiting_tasks_count for the first test's session_id 55.
Same with wait_time_ms and signal_wait_time_ms.  Nifty :-)

And no ASYNC_NETWORK_IO waits recorded during the 2nd test with NOCOUNT ON.  Not at session level, not at system level.

So NOCOUNT made a big difference here.  Huh.

Well - can perfmon tell us anything in addition to what we've already seen?  Yep. First of all, today was the day I learned that the perfmon wait counters only update every 5 seconds :-).  But the other thing perfmon shows us is that the ASYNC_NETWORK_IO waits that occur while NOCOUNT is OFF occur throughout the WHILE LOOP.  And... the total wait time isn't determined by the work being done, but by network latency from SQL Server to the client.  If the same test is run within the VM, a small amount of ASYNC_NETWORK_IO wait occurrences and total wait time might be observed.  But it'll be far less than what I show in these tests.  And... this particle test run was really tame.  Earlier in the week I ran a similar test from same laptop, same VM and saw the variance from under 10 seconds to over 10 minutes.  Gotta go for now though...

No comments:

Post a Comment