Friday, July 29, 2016

SSAS Multidimensional Cube Processing - What Kind of Work is Happening?

**Update**
I added a post on August 2, 2016 - answering some questions and leading to others :-)
SSAS Multidimensional Cube Processing - Shrinking the unfamiliar window
http://sql-sasquatch.blogspot.com/2016/08/ssas-multidimensional-cube-processing.html
****
 
In order to evaluate SQL Server systems, I've worked perfmon data over and over until I was satisfied that I could tell a high level story of what work was being done(rows scanned/sec,rows returned/sec, rows written/sec, log flush bytes/sec) and who was doing that work(active requests, active parallel threads).

I'm starting to work through a similar process for SSAS MultiDimensional Cubes - processing first.

Some graphs from perfmon.

Because Analytics Services is on a separate VM from the RDBMS server, I know that all of the incoming data to be worked on is coming over the wire.

So what kind of relationship is there between network bytes received and CPU utilization?


That trends pretty well.  I've put a box around an interesting area. In that area, bytes received per second is about the same as earlier periods of time.  But CPU is quite a bit lower.
What's being done with the data that is coming in?

Well, that makes sense.  Rows are read, converted, and written.  Looks like writing rows is timer-based, or maybe threshold-based?

But the unusual timeperiod is still unusual.  Rows are read, but not converted or written.  Hmmm.

OK, lets take a look at rows read/converted/written against CPU utilization.


OK.  CPU utilization tracks pretty will with rows read/converted outside of the mystery range following 9:30.
But now 3 additional interesting timeperiods are evident, in the red boxes.  CPU utilization is much higher in the red boxes than would be predicted by rows read/converted.

What's the CPU doing at about 8:05, 8:45, 9:30? (Not forgetting the original question about 9:40 to about 10:15.)

Maybe there's other kinds of work I have included in these graphs?

How about index rows/sec?


Aha!  OK.  So index rows/sec is another part of the picture, and helps to explain some of the CPU utilization not explained by rows read/converted/written.  But it still doesn't explain the tail end of the graph.

I also noticed that there is a counter for rows created/sec.


All right. Got some rows created at 8:05 am, and 8:45 am.  None at 9:30.  And none in the final mystery period.

So the work of read/convert/create/write rows and index rows is a pretty good profile of the CPU work done up until about 9:40 am.

What type of threads were doing the work? What about the non I/O threads in the processing pool.


Wow.  Very weak relationship between the number of these threads and CPU utilization.  These processing pool non-I/O threads may be assigned to tasks, and in that sense "busy".  But in this case, it sure seems like they are mostly busy waiting.

There are a couple more thread types to consider.


So the I/O threads match well to the index rows and created rows.  There's only 1 busy query pool thread over the whole time. 

The processing pool non I/O threads must be doing most if not ALL of the read/convert/write work.  But most of these threads must be spending most of their time waiting, since the number of "busy threads" has almost no correlation to CPU utilization.

But I still don't know what was going on in that mystery period.  That'll have to be for another day...