Thursday, November 30, 2017

Hey! What's the deal with SQL Server NOCOUNT and T-SQL WHILE loops?

(This testing was performed on SQL Server 2016 SP1 CU0.)
OK. I like logging a lot of information for my tests.

So I created two-three logging tables for perf stat captures :-)

CREATE TABLE count_reg_2
(test_start     DATETIME,
 session_id     INT,
 login_time     DATETIME,
 scheduler_id   INT,
 cpu_id         INT,
 parent_node_id INT,
 memory_node_id INT,
 elapsed_ms     INT,
 iter           BIGINT,
 cpu_ms         BIGINT,
 session_end    DATETIME)
WITH (data_compression = page);

DROP TABLE IF EXISTS count_session_waits_2;
CREATE TABLE count_session_waits_2
(test_start          DATETIME,
 session_id          INT,
 wait_type           NVARCHAR(256),
 waiting_tasks_count BIGINT,
 wait_time_ms        BIGINT,
 max_wait_time_ms    BIGINT,
 signal_wait_time_ms BIGINT)
WITH (data_compression = page);

DROP TABLE IF EXISTS count_system_waits_2;
CREATE TABLE count_system_waits_2
(capture_tm          DATETIME,
 wait_type           NVARCHAR(256),
 waiting_tasks_count BIGINT,
 wait_time_ms        BIGINT,
 max_wait_time_ms    BIGINT,
 signal_wait_time_ms BIGINT)
WITH (data_compression=page);

And I've got a stored procedure for my test workload. This is after a few revisions - there's a couple of important comments in there that maybe someday I'll be able to explain more adequately :-) Why count to 8000000 in the loop by default? It was an early estimate for 10 seconds of solid CPU work on this system, and I wasn't motivated enough to bump it higher once I saw that it fell a little short once NOCOUNT was ON. But I don't want to spill the beans early.

So... here's the stored procedure.

CREATE OR ALTER PROCEDURE timed_count__reg_2 @maxcount INT = 8000000, @test_start DATETIME, @NOCOUNT VARCHAR(10) = 'OFF', @syswaits VARCHAR(10) = 'OFF'
        @end_time   DATETIME,    
        @iter       INT = 0;

WAITFOR DELAY '00:00:01'; -- session waits don't always flush for this test!?! more reliably flush with a WAITFOR


SET @start_time = GETDATE();
IF @syswaits = 'ON'
INSERT INTO count_system_waits_2
SELECT @start_time, ws.*
FROM sys.dm_os_wait_stats ws
WHERE ws.waiting_tasks_count > 0;

WHILE @iter < @maxcount SET @iter = @iter + 1;

SET @end_time = GETDATE();
IF @syswaits = 'ON'
INSERT INTO count_system_waits_2
SELECT @end_time, ws.*
FROM sys.dm_os_wait_stats ws
WHERE ws.waiting_tasks_count > 0

INSERT INTO count_reg_2
SELECT @test_start, @@spid, sess.login_time, req.scheduler_id, sch.cpu_id, sch.parent_node_id, nde.memory_node_id, 
       DATEDIFF(ms, @start_time, @end_time), @iter, sess.cpu, @end_time
FROM   sys.dm_exec_requests req
JOIN   sys.dm_os_schedulers sch  ON req.scheduler_id = sch.scheduler_id 
JOIN   sys.dm_os_nodes      nde  ON sch.parent_node_id = nde.node_id
JOIN   sys.sysprocesses     sess ON req.session_id = sess.spid  --hey!! important to use sysprocesses here rather than exec_sessions, better reporting of cpu time
WHERE  req.session_id = @@spid

INSERT INTO count_session_waits_2
SELECT @test_start, @@spid,  sws.wait_type, sws.waiting_tasks_count,  sws.wait_time_ms,
       sws.max_wait_time_ms, sws.signal_wait_time_ms
FROM   sys.dm_exec_session_wait_stats sws
WHERE  sws.session_id = @@spid

OK.  Now Aaron Bertrand has a post from February 2016...
Performance Surprises and Assumptions : SET NOCOUNT ON

In that blog post the potential performance benefit of NOCOUNT ON was elusive.  And, to be honest, this round of my tests was NOT to learn about NOCOUNT but other stuff.  Just happened to stumble on NOCOUNT when I was sifting through stuff after some early testing.  But with these results in hand, maybe NOCOUNT has a few more performance surprise for all of us :-)

OK, still need a driver for the tests.  I'm old as the dinosaurs so I usually throw together a .bat script before Powershell.

I put together a pair of .bat scripts, one for NOCOUNT OFF (default) and one for NOCOUNT ON. These scripts resided on my laptop, and will be executed from my laptop, with a remote connection to SQL Server via sqlcmd. They looked like this.

set date1=%date:~4%
set date2=%date1:/=-%
set datetime='%date2% %time%'
START /B sqlcmd -dSASQUATCH -SMountain_home -Q "EXEC timed_count__reg_2 8000000, %datetime%, 'OFF', 'ON';" > nul

set date1=%date:~4%
set date2=%date1:/=-%
set datetime='%date2% %time%'
START /B sqlcmd -dSASQUATCH -SMountain_home -Q "EXEC timed_count__reg_2 8000000, %datetime%, 'ON', 'ON';" > nul

The timed_count__reg_2 stored procedure does a fair bit of self-logging, but whenever possible I like to keep eyes-on from outside the individual test sessions as well.  Plus, I really like perfmon.

So, on the SQL Server RDBMS VM I have a Sasquatch directory.  Its got a .bat script for starting logman and a text file for the counters to log.  And a .bat script to stop the counter collection. For this type of test, I typically use a one second collection interval.

Fire up the Sasquatch collector.  First executed the script with NOCOUNT OFF.  Then executed the script with NOCOUNT ON.  Stopped logman.  Let's see what I've got.

I'm glad that worked out.  The first test was with NOCOUNT OFF.  It ran about 50 seconds.  The second test was with NOCOUNT ON.  It ran really hot on the CPU, in less than 10 seconds. There's something goofy about the CPU vs elapsed numbers for that particular run... that'll have to be for another day.

Notice the difference in the first test between the end ASYNC_NETWORK_IO waiting_tasks_count and the beginning count: 194332 - 186696 = 7636

That's exactly the ASYNC_NETWORK_IO waiting_tasks_count for the first test's session_id 55.
Same with wait_time_ms and signal_wait_time_ms.  Nifty :-)

And no ASYNC_NETWORK_IO waits recorded during the 2nd test with NOCOUNT ON.  Not at session level, not at system level.

So NOCOUNT made a big difference here.  Huh.

Well - can perfmon tell us anything in addition to what we've already seen?  Yep. First of all, today was the day I learned that the perfmon wait counters only update every 5 seconds :-).  But the other thing perfmon shows us is that the ASYNC_NETWORK_IO waits that occur while NOCOUNT is OFF occur throughout the WHILE LOOP.  And... the total wait time isn't determined by the work being done, but by network latency from SQL Server to the client.  If the same test is run within the VM, a small amount of ASYNC_NETWORK_IO wait occurrences and total wait time might be observed.  But it'll be far less than what I show in these tests.  And... this particle test run was really tame.  Earlier in the week I ran a similar test from same laptop, same VM and saw the variance from under 10 seconds to over 10 minutes.  Gotta go for now though...

Tuesday, November 14, 2017

#tsql2sday 96 Reflecting on those that guided my path...

T-SQL Tuesday is always a bright spot for me, whether I write up a post myself or not.  If you read my blog but have never participated... consider it!! Its a fairly low-friction way to get in with a good group!

Today SQL Server bloggers from all over are reflecting on those that have inspired, guided and influenced them.  So I'll name a few names and share some stuff.  Over time, I'll be able to share even more names, maybe a story or a few.  These folks are heroes to me... but many of them I also consider friends. Which makes me a very blessed individual.

I got a late start working with software - after grad school and a bit of wandering.  The name CR Boardman won't be familiar in SQL Server circles, but his personality looms large in memories of my early years working with software.  He loves investigation, and has a fierce determination for rooting out technical problems.  His problem-solving approach knows no boundaries: he'd trace a problem workflow on the database side, then track down the client-side code responsible for making the RPC code calls on the database.  He refused to be siloed - although the organizational responsibility for a given code module or feature may lie within a different group, that never meant trouble-shooting and investigation had to stop at an arbitrary border.  I remember some late nights in conference rooms, working through some challenging problems.  More than any particular problem-solving approach or tool, I remember his enthusiasm and excitement.  The last time we worked together was nearly 10 years ago now - I hope my work now embodies those characteristics I admired in CR.

Andy Yun (@SQLBek) has been mentioned in a number of blog posts today, and I'll mention him too!  Andy's put together a few presentations in the last couple of years that many people have really benefited from.  He's not satisfied to simply enhance his own speaking resume, though.  Andy does a great job of encouraging folks to get involved and share their experiences, expertise and knowledge.  In my case, he's employed some friendly needling to make sure I don't crawl back into a cave to avoid blogging and speaking 😊.  Yep, sometimes I've absolutely needed that.  Now I hope to not only remain on the path of sharing my adventures through Twitter, my blog, and some presentations... I hope to encourage others to do the same.  Maybe even do some considerate needling... if I have the guts😊.

Niko Neugebauer (@NikoNeugebauer) has a lot of applied energy.  A real lot! Niko is often a catalyst - causing others around him to also start engaging with technical problems and solutions in new ways.  When you spend time with Niko, a striking devotion comes to the fore: a devotion to constant forward motion.  If there's time left today, there's time left to try something new, or learn something new for tomorrow.  Going to discuss blog or presentation ideas?  Niko will probably ask you what new contributions will be in the presentation - how are you going to keep things moving forward?  This might give some insight into how he has generated 115 blog posts to date on SQL Server columnstore indexes.  I hope my appreciation for innovation is as apparent as it is in Niko.  His energy and enthusiasm convinced me to participate in Tuga IT for the last two years.  And I plan to keep participating in the future!
Ewald Cress (@sqlOnIce) is hosting the T-SQL Tuesday posts this time around - check out his blog for a round-up of all of the posts for the November 14th 2017 adventure... and a lot of other creative SQLOS deep-dives.
Ewald has helped to guide my approach to my blog and to speaking opportunities - though he may not know the extent.  He and I both tackle questions that tend to be off the beaten path, though from different perspectives.  He's made some pretty deep topics into creative blog posts and presentations (which I'll hopefully get to see sometime). His approach to providing details on the parts we might otherwise take for granted impresses me - and hopefully I'll be able to fill out some of my blog posts in a similar manner.

That's all I'll write for tonight... as I mentioned before there are additional names... and stories... which will almost certainly come spilling out of my over a shared meal - or maybe even on the pages of this blog.

Ciao for now!

Thursday, November 9, 2017

Windows paging file usage & SQL Server 2017 RTM: sys.dm_os_sys_memory is Tricky

OK... using my laptop SQL Server 2017 RTM CU1 instance to show something tricky about reconciling paging space usage.  The SQL Server 2016 versions I checked briefly displayed the same behavior I describe below.

A new tool was added to the belt recently - I like the idea.  Since SQL Server will respond to server level low memory messages, I like being able to see the state of server memory.  For that has Microsoft wrought sys.dm_os_memory.

sys.dm_os_sys_memory (Transact-SQL)

There's good stuff in there for investigation of OOM scenarios.

Let's see what turns up on my laptop.

Based on the column names and values above, seems natural to think:
total_page_file_kb - available_page_file_kb = used page file kb
11027476 kb - 3047668 kb = 7979808 kb

Holy cow! Is my laptop using nearly as much paging space as there is RAM on the laptop??
Weird. If something forced that much paging space use relative to RAM on the laptop... I certainly wouldn't expect system_memory_state_desc = 'Available physical memory is high'!!

Lets double-check by using wmic in an administrator cmd.exe.

OK... its clear already something is afoot.  AllocatedBaseSize is the current size of the pagefile indicated in 'Description', in mb.  So... 2880 mb is the size of the pagefile.  And only 1684 mb currently used, with a peak usage of 2752 mb.  Almost 8 gb of pagefile used as reported by SQL Server sys.dm_os_sys_memory?  Nope - no way.

So what's up?  Sometimes memory accounting in SQL Server takes some work. 😀😀😀

I stared at the numbers for a bit and had a hunch... double-checked and it tied out.

11027476 kb - 8077528 kb = 2949948 kb ≈ 2880mb


total_page_file_kb - total_physical_memory_kb =AllocatedBaseSize

Aha!  Looks like total_page_file_kb column is misnamed. Because...

total_physical_memory_kb + AllocatedBaseSize = total_page_file_kb

So total_page_file_kb is really more like 'total virtual memory kb'!

OK.   Let's look back at the definitions in BoL.

sys.dm_os_sys_memory (Transact-SQL)

The description of available_page_file_kb is wrong. Since total_page_file_kb is described as the commit limit - or the virtual memory as sum of physical memory + paging space - the available_page_file_kb is then not best defined as an available portion of the paging file but as an available portion of the commit limit or virtual memory.

All right - enough for now.  See you again soon!

Wednesday, November 8, 2017

SQL Server 2016/2017 Soft-NUMA Part I

Just what you probably wanted - another "part 1" 😂😂😂 .

I'll start looking at soft-NUMA in SQL Server 2016 & 2017 - probably 3 or 4 blog posts to get where I want to go for now.  Let's start by looking at soft-NUMA on my dual core + HT laptop on SQL Server 2017 RTM CU1.

How does the soft-NUMA and NUMA configuration look right now? Plain - like you'd expect.  A single memory_node_id, single node_id.  4 scheduler_ids, 4 cpu_ids associated with that node_id/memory_node_id.

How about the lazy writer, transaction log writer, and resource monitor sessions?  Yep, they're pretty plain.  One each. Whaddya expect out of a little laptop? 😅

Pssssst. Before I did this testing, I backed up the registry. Just in case.  Please do that if you decide to follow along - even if its on your own laptop.

OK... let's add manually configured soft-NUMA to the mix. I'll use these commands in an administrator cmd.exe...

REG DELETE "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration" /f
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node0"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node1"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node2"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node3"
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node0" /v CPUMask /t REG_DWORD /d 0x001
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node1" /v CPUMask /t REG_DWORD /d 0x002
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node2" /v CPUMask /t REG_DWORD /d 0x004
REG ADD "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\140\NodeConfiguration\Node3" /v CPUMask /t REG_DWORD /d 0x008

First command above deletes a key and all of its subkeys (clearing any previous manual soft-NUMA config for SQL Server 2017). If the NodeConfiguration key doesn't exist, the benign error indicating it couldn't be deleted is seen.

Then NodeConfiguration and Node* keys are added. Finally, CPUMask values are added to each of the Node* keys - REG_DWORD data type and hex value for the CPUMask for each soft-NUMA node.

This is what that looks like...

Now, the soft-NUMA configuration can't catch hold until the next time SQL Server starts up.  So let's stop and start SQL Server from the cmd.exe.

OK. First stop - the SQL Server error log.  Hmmm.  That looks promising.

OK, now lets look at memory_node_ids & node_ids again.  There's still a single memory node - that makes sense, because this laptop is a single NUMA node construct. But now there are 4 node_ids, each with a single scheduler_id & cpu_id associated. Rather than a single node_id with all scheduler_ids/cpu_ids associated.

All right - lets look at lazy writer, tx log writer, and resource monitor again. Now there are 4 txlog writers - that's the maximum.  The documentation previously said that SQL Server 2016 would have one txlog writer per NUMA node (up to 4), on NUMA node 0.  My little laptop has the maximum number of txlog writers possible for SQL Server 2016 or 2017 😏 Apparently the txlog writers scale with soft-NUMA nodes rather than NUMA nodes.  That's probably not optimal, given that there is a cost to coordinating transaction log write activity among the multiple txlog writers.  What's worse, though, is that all 4 txlog writers are on the same cpu_id!  All of the added cost to distribute/coordinate work among 4 txlog writers, none of the benefits of spreading the writers across multiple schedulers or cores.  That's almost certainly a net loss for efficiency.

But wait! There's more! Still got a lazy writer on cpu_id 0, too.  Imagine a high write-throughput workload.  The four txlog writers will be contending with each other for CPU time on cpu_id 0.  (That will drive up logwrite waits as the txlog writers experience wait time to get on CPU.  The lazy writer will be contending with the txlog writers for CPU time as well.

There's another session on cpu_id 0 - before even starting to look at user sessions.  There's a Resource Monitor session on cpu_id 0 as well.  Under heavy load, CPU 0 is gonna be in trouble - so too will anyone waiting on cpu_id 0. For example, queries with significant logwrite wait time will see their logwrite waits amplified.

Using soft-NUMA to distribute work among CPUs is sometimes a very important strategy.  Several txlog writer threads sharing the same CPUs is probably not a good idea without some amelioration.  One thing that can be explored is using server or resource pool affinity to keep user connections and parallel tasks away from node 0 = cpu_id = 0.

There are cases where multiple txlog writers can be beneficial - but keeping them all on the same cpu_id as in this case is almost certainly a net loss. Here's a Connect item...

All right - that's it for now.  But I'll be back with more soft-NUMA fun in a little bit...

Tuesday, November 7, 2017

Investigating OOMs on SQL Server 2016 SP1 CU4 - Part I

Today OOMs were reported on a physical server with 1.5TB of RAM, running SQL Server 2016 SP1 CU4.  I started to poke around a bit... pretty sure we'll need to set up perfmon and maybe a SQL agent job to capture numbers every 5 minutes.  Capturing the memory clerk numbers seems like the place to be.

Here's the SQL if you want to play along at home. But I'm too sleepy to do anything with it tonight.

;WITH clerk_kb 
     AS (SELECT [type],
                --roll up 'ACRUserStore%' & 'SecCtxtACRUserStore' numbers
                CASE WHEN [name] LIKE 'ACRUserStore%' THEN 'ACRUserStore' 
                     WHEN [name] LIKE 'SecCtxtACRUserStore%' THEN 'SecCtxtACRUserStore' 
                ELSE [name] END clerk_name, 
                memory_node_id, pages_kb
        FROM sys.dm_os_memory_clerks)
SELECT GETDATE() AS capture_tm, clerk_name, memory_node_id, SUM(pages_kb) pages_kb
FROM clerk_kb
--keep the two clerks per database - dbname and {objperm + dbname} - out of results 
WHERE NOT EXISTS (SELECT 1 FROM sys.databases sdb 
                  WHERE PATINDEX('%' + sdb.[name] + '%', clerk_name) > 0)
GROUP BY clerk_name, memory_node_id
HAVING SUM(pages_kb) > 40000
ORDER BY clerk_name, memory_node_id;

Monday, November 6, 2017

SQL Server 2017 - soft-NUMA limits MAXDOP? Nope.

While at PASS Summit this year (2017), I was in a discussion about distribution of SQL Server tasks and connections.  It's a common theme with me - I often work with batch workloads and I want work as evenly distributed over available compute resources as possible, for as long as possible, in order to minimize elapsed workload time.

I mentioned that I was planning to set up a soft-NUMA node for each vcpu on a 16 vcpu VM, to evenly distribute incoming connections and thus DOP 1 queries over vcpus.  Thomas Kejser et al used this strategy to good effect in "The Data Loading Performance Guide", which used SQL Server 2008 as a base.

My conversation partner cautioned me that leaving this soft-NUMA configuration in place after the specialized workload would result in DOP 1 queries whether I wanted them or not.  The claim was, effectively, a parallel query plan generated by a connection within a soft-NUMA node would have its MAXDOP restricted by the scheduler count (if lower than other MAXDOP contributing factors).  Though I wasn't able to test at the time, I was skeptical: I'd always thought that soft-NUMA was consequential to connection placement, but not to MAXDOP nor to where parallel query workers would be assigned.

I'm back home now... time to test!! This test is on Microsoft SQL Server 2017 (RC2) - 14.0.900.75, on my dual core (with HT) laptop.  I've set up soft-NUMA, with one soft-NUMA node for each scheduler/logical CPU.  Notice below that all of the soft-NUMA nodes (in sys.dm_os_nodes) are in a single memory node.  That's actually the single physical NUMA node of my laptop.

So what happens when a parallel query is attempted on this system with soft-NUMA nodes of 1 scheduler each? It still gets DOP 4. And the actual plan shows each of the 4 parallel workers doing some work.

For the sake of completeness, I also want to show that work is being distributed across the schedulers in this setup.  Let's create and populate a temp table for the tare.

Now let's run the test query 64 times.

OK, let's take the post-test measure and check the delta.

Excellent. So we've seen that even with soft-NUMA nodes of single schedulers, parallel queries are getting DOP > 1, and work is being distributed across all schedulers (although with short tasks and a single session, not very evenly).  I'm putting the code I used for this test below - although the most interesting part may be setting up the soft-NUMA itself.  I'll leave that for another day :-)

Ciao for now!

SELECT oss.scheduler_id, oss.cpu_id, osn.node_id, osn.memory_node_id, 
       osn.online_scheduler_mask, osn.online_scheduler_count 
FROM sys.dm_os_nodes osn
JOIN sys.dm_os_schedulers oss ON osn.node_id = oss.parent_node_id
WHERE osn.node_state_desc = 'ONLINE'
  AND oss.status = 'VISIBLE ONLINE';

    (capture_tm DATETIME,
     scheduler_id INT,
     total_cpu_usage_ms BIGINT
         (capture_tm, scheduler_id, total_cpu_usage_ms))

SELECT getdate() capture_tm, scheduler_id, total_cpu_usage_ms
FROM sys.dm_os_schedulers

DECLARE @ij INT = 1, @redirect INT;
WHILE @ij < 65
SELECT @redirect = SUM(1) FROM sys.dm_os_nodes os
CROSS JOIN sys.dm_os_nodes os2
OPTION (querytraceon 8649);
SET @ij = @ij +1;

SELECT getdate() capture_tm, scheduler_id, total_cpu_usage_ms
FROM sys.dm_os_schedulers WHERE status = 'VISIBLE ONLINE';

;WITH max_cte AS (SELECT scheduler_id, 
                         MAX(total_cpu_usage_ms) max_ms 
                  FROM #t GROUP BY scheduler_id),
      min_cte AS (SELECT scheduler_id, 
                         MIN(total_cpu_usage_ms) min_ms 
                  FROM #t GROUP BY scheduler_id)
SELECT max_cte.scheduler_id, 
       max_cte.max_ms - min_cte.min_ms AS delta_cpu_usage_ms
FROM max_cte 
JOIN min_cte ON max_cte.scheduler_id = min_cte.scheduler_id;