Monday, September 14, 2015

I got a hunch about SQL Server Fragmentation Part V

Picking up from previous work in a recent post, which can be found here.

I got a hunch about SQL Server Fragmentation Part IV
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_14.html

Lets create two more tables, with the same structure as sq_CI.


CREATE TABLE [dbo].[sq_CI_2]
            ([ID_INT] [int] NOT NULL,
             [VARCHAR_4096] [varchar](4096) NULL
             CONSTRAINT [PK_sq_CI_2] 
   PRIMARY KEY CLUSTERED([ID_INT] ASC) ON [PRIMARY]
             ) ON [PRIMARY];
CREATE NONCLUSTERED INDEX nci_sq_CI_2 
       ON sq_CI_2 ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 ); 

CREATE TABLE [dbo].[sq_CI_3]
            ([ID_INT] [int] NOT NULL,
             [VARCHAR_4096] [varchar](4096) NULL
             CONSTRAINT [PK_sq_CI_3] 
   PRIMARY KEY CLUSTERED([ID_INT] ASC) ON [PRIMARY]
             ) ON [PRIMARY];
CREATE NONCLUSTERED INDEX nci_sq_CI_3 
       ON sq_CI_3 ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 );

K. Now lets populate them.


INSERT INTO sq_CI_2 SELECT * FROM sq_CI ORDER BY ID_INT;
INSERT INTO sq_CI_3 SELECT * FROM sq_CI ORDER BY ID_INT;

How do they look?


SELECT   object_name(pstats.object_id) AS tname,
         index_type_desc,index_level,
         avg_fragment_size_in_pages,
         fragment_count,page_count,
         avg_fragmentation_in_percent 
FROM     sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE    pstats.object_id IN (object_id('sq_CI'),object_id('sq_CI_2'),object_id('sq_CI_3'))
ORDER BY pstats.object_id ;



I guess that's expected. The original table sq_CI is still optimized from previous work. The new tables sq_CI_2 and sq_CI_3 look pretty good  - but not as good as sq_CI.

Lets look a little closer, though, with Merrill Aldrich's File Layout Viewer*.
Table sq_CI (below) looks as expected - a large, mainly contiguous chunk for the clustered index PK_sq_CI, and another large, mainly contiguous chunk for nonclustered index NCI_sq_CI.


What about tables sq_CI_2 and sq_CI_3?  Here's what sq_CI_2 looks like in the viewer.  The more blue-based of the shades is the clustered index - several large chunks for the 4088 rows/pages.  The more red-based shade is the nonclustered index, with the same contiguity and interleaving as the clustered index. Table sq_CI_3 looks the same - is no other activity in this SQL Server instance, there aren't any other data structures to become interleaved with other than those I am creating, and there's enough contiguous free space in the single mdf file to allow SQL Server its first choice for extents and pages as growth occurs.


That's actually kinda boring, though.  I'm really interested in what happens if the two full-table inserts are happening concurrently.


TRUNCATE TABLE sq_CI_2;
TRUNCATE TABLE sq_CI_3;

So I'll use a .bat script to execute sqlcmd, with "START /b" so that a single script can launch both inserts nearly simultaneously**.

Contents of my .bat script (local to the SQL Server).

START /b sqlcmd -E -S localhost -d sq_E_single -Q "INSERT INTO sq_CI_2 SELECT * FROM sq_CI ORDER BY ID_INT"
START /b sqlcmd -E -S localhost -d sq_E_single -Q "INSERT INTO sq_CI_3 SELECT * FROM sq_CI ORDER BY ID_INT"


After executing the .bat script, let's take a look with Merrill Aldrich's viewer (can't recommend this tool* enough for exploration of fragmentation considerations).  Yeah, that's what I wanted to show.


What does sys.dm_db_index_physical_stats have to say?


SELECT   object_name(pstats.object_id) AS tname,
         index_type_desc,index_level,
         avg_fragment_size_in_pages,
         fragment_count,page_count,
         avg_fragmentation_in_percent 
FROM     sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE    pstats.object_id IN (object_id('sq_CI'),object_id('sq_CI_2'),object_id('sq_CI_3'))
ORDER BY pstats.object_id ;




So... avg_fragmentation_in_percent for sq_CI_2 and sq_CI_3 - the same as it was earlier. 
The avg_fragmentation_in_percent value is so low at ~0.0245% that I don't think anyone would do anything about it.

But fragment_count and avg_fragment_size_in_pages hint at the physical interleaving that is graphically displayed by the viewer.

What happens with a fullscan of sq_CI_3 clustered index (since it has the most reported fragments)?


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI_3 WHERE varchar_4096 like '%b%'
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);

Well, first of all I shouldn't assume the clustered index will be used :-) In this case, although I can't be sure its why the optimizer chose the NCI, a good reason would be expected lower IO cost due to fewer fragments for full scan.


Of course, I didn't think of that until after I ran the query :-) So here's what sys.dm_io_virtual_file_stats and STATISTICS IO had to say.

320 reads, 33742848 bytes read => 4119 eight kb database pages read.  2 of those reads were counted by STATISTICS IO as physical reads, the remaining 318 reads reported by sys.dm_io_virtual_file_stats included the 4103 read-ahead pages. Remember that the optimized clustered index AND nonclustered index of sq_CI was shown to do full scan in 68 reads, 33619968 bytes read - 4104 8k pages.  This nonclustered index is using 252 more reads (and reading 15 more 8k pages) to scan the same logical contents.

We were coming for the clustered index, though.


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI_3 WITH (INDEX(pk_sq_CI_3)) WHERE varchar_4096 like '%b%'
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);

Check the estimated plan for a clustered index scan cuz I don't wanna make the same mistake twice.


Good to go.  Execute the SQL batch, check the results.



338 reads, 33742848 bytes = 4119 eight kb pages.  Very similar to the results from the nonclustered index, but requiring an additional 18 reads (which I'm fairly certainly were all to cover the read-ahead pages, even though both clustered and nonclustered index full scans used read-ahead for 4103 pages.

So, there you have it.  In a single database file, populating these two tables in parallel rather than one-at-a-time resulted in 252 (NCI) or 270 (CI, slightly less contiguous than NCI) more reads than the optimized sq_CI table.  The 15 more eight kb pages read by the non-optimized full scans in my opinion pales in comparison to the inflation of read operations.

There will be more fun, I promise!!  We've still gotta see what happens when we try to correct fragmentation in a "mature" database file, gotta see the effects of multiple files in a filegroup, and once we've got multiple files in the filegroup can't stay away from a -E startup option showdown!  (I'm getting there, @DBArgenis, its just taking me a while :-})


*The SQL File Layout Viewer v1.2 can be found at the link below. Its good stuff. Mainly for educational purposes, in current form not suitable for large databases.
http://sqlblog.com/blogs/merrill_aldrich/archive/2013/10/09/sql-file-layout-viewer-1-2.aspx
**Additional information on "start" command.
https://technet.microsoft.com/en-us/library/bb491005.aspx




No comments:

Post a Comment