Saturday, April 26, 2014

Database Performance Investigation/Intervention, MTPoD, Last Resorts

In business continuity and disaster recovery planning, one of my favorite foundational ideas is Maximum Tolerable Period of Disruption, or MTPoD. Although most effort and budget will be invested in defining and implementing RPO and RTO, during a crisis having a settled MTPoD can guide the team and resources, and there can be some level of confidence if a last resort is considered or employed.

Although most of my work these days focuses on performance and scalability rather than BC or DR, I'm a "last resort" kinda guy - and this is a "last resort" kinda blog. I didn't plan for this to be my position, but I ain't complainin' either :-)

See, there is room - even a need, I believe - in performance and scalability work for "last resorts". In order to be confident when considering performance/scalability last resorts, I believe MTPoD is very important.

An ideal performance/scalability investigation and intervention may look like this:
1. User feedback, task monitoring, or system monitoring triggers investigation
2. Activity, performance, resource utilization, error, and change logs are reviewed and compared between baseline and problem contexts.
3. Potential suspects are identified, and additional diagnostic monitoring may be put in place in production.
4. Problem recreation attempted in nonproduction environment.
5. If initial nonprod recreation attempts are unsuccessful, the additional production diagnostics may give more insight into reproducing the problem.
6. If problem is reproduced in nonproduction, further diagnostics can be performed there. That's important because sometimes conclusive diagnostics are too invasive or consume too many resources for production environments.
7. At this point, production or nonprod diagnostics may have a tentative, or even conclusive diagnosis.  Corrective actions can then be implemented and validated in nonproduction.
8. Even if the issue cannot be reproduced in nonproduction, the absence of harmful side effects of potential correctives, such as a SQL Server sp or cu install, can be tested in nonprod.
9. Correctives can be promoted to production per change control processes.
10. Correctives in nonprod and prod are evaluated against expectations. Process iterates to step 2 if necessary.

But... sometimes it's even uglier than that.  Sometimes, system behavior is truly unhealthy... despite pulling in experts for weeks... gathering lots of diagnostic data... addressing evident tuning opportunities... the system is still unhealthy.

I'm an advocate for understanding database system behavior - healthy and unhealthy.  But I also know that there are times for last resorts... sometimes service disruption comes near to MTPoD. I've been there. And I've called for action... sometimes in a room of folks who look at my whiteboard scribbles and think I may not be in my right mind... and sometimes while other experts are voicing the protest that they have never "seen that work".  

I haven't always been right in those situations. I believe in stating my level of confidence, and not sugar-coating the risks.  But often, when I'm involved its because the situation is already kinda desperate.

And, if I've had access to the diagnostics I ask for... I feel pretty good about my track record of being right :-)

Database erformance and scalability investigation/intervention is tough work.  It's complex, in both breadth and depth.  It's risky.  Ask for a downtime to implement a "fix"... and if the benefit isn't evident afterward you may have burned a lot of trust.  (That's one reason I believe in expressing confidence level for both diagnoses and remedies.)  And if MTPoD is approaching, standard process may need to be suspended: maybe possible correctives need to be implemented even though potential diagnoses are highly speculative, maybe exceptions to normal change control need to be employed.  Having a sense of MTPoD can minimize regret when performance investigation/intervention deviates from standard operating procedure.*

*When an extraordinary "fix" resolves an issue that had no strong suspect for diagnosis, I do recommend continuing to pursue diagnosis in post-Mortem analysis.  Sometimes, though, the nature of the previous problem won't be uncovered without an unreasonable cost.


Friday, April 25, 2014

AIX - listing hdisk device names for specific type of devices

This will list just name for all of the hdisks on an IBM AIX LPAR:
lsdev -Cc disk -Fname

Its listing type 2107 and type puredisk right now.  Will it list the hdiskpower devices too?  I imagine it will, have to check as soon as I line up another LPAR connected to a VMAX :)

I'm also hoping that the following will give me the name of the hdiskpower devices only:
lsdev -C -t power -Fname.


Cuz this works...
# lsdev -C -t puredisk
hdisk137 Available 00-00-02 PURE MPIO Drive (Fibre)
hdisk138 Available 00-00-02 PURE MPIO Drive (Fibre)
hdisk139 Available 00-00-02 PURE MPIO Drive (Fibre)
hdisk140 Available 00-00-02 PURE MPIO Drive (Fibre)
hdisk141 Available 00-00-02 PURE MPIO Drive (Fibre)
hdisk142 Available 00-00-02 PURE MPIO Drive (Fibre)
hdisk143 Available 00-00-02 PURE MPIO Drive (Fibre)
hdisk144 Available 00-00-02 PURE MPIO Drive (Fibre)

So does this...
# lsdev -C -t puredisk -Fname
hdisk137
hdisk138
hdisk139
hdisk140
hdisk141
hdisk142
hdisk143
hdisk144


*** Update 20140507 ***
Good to go...


#lsdev -C -t power -Fname
hdiskpower0
hdiskpower1
hdiskpower2
hdiskpower3
hdiskpower4
hdiskpower5
hdiskpower6
hdiskpower7
hdiskpower8
hdiskpower9
hdiskpower10
hdiskpower11
hdiskpower12
powerpath0
 



Whaddayaknow 'bout #SQLServer Trace Flag 4134?

I don't know much about trace flag 4134 - I ran into it while researching a primary key error that I believe was falsely triggered.  Searching for primary key errors in parallel insert queries I happened on the Connect item that is last in the list below.  Hmmmm...

Now I think maybe this trace flag might eliminate some intra-query deadlock conditions I've been tracking as well.  Maybe.

Here's what I wish I knew, but haven't been able to glean  from any source:
1. Are global, session, and querytraceon scope all valid for trace flag 4134?
2. The possibility of T4134 behavior becoming default behavior at some point is mentioned in the Connect item.  Has it become default behavior in SQL Server 2012 or SQL Server 2014?

Well... I'll update this post with anything more that I learn.  And if you land here for any reason, and know something about this trace flag beyond what I've got here... please comment.  Thanks!!

*****

FIX: You receive an incorrect result when you run a query that uses the row_number function in SQL Server 2008 or in SQL Server 2008 R2
http://support.microsoft.com/kb/970198

FIX: Results may change every time that you run a parallel query in SQL Server 2005, in SQL Server 2008, or in SQL Server 2008 R2 if the query uses a ranking function and if the computer has eight or more CPUs
http://support.microsoft.com/kb/2546901

Parallel insert plan causes Primary Key Violation 
http://connect.microsoft.com/SQLServer/feedback/details/634433/parallel-insert-plan-causes-primary-key-violation

"… see the following KB: http://support.microsoft.com/kb/970198. Even that the symptom described in that KB is different, the root cause is the same. Please, note the mentioned Hotfix requires special trace flag 4134 to be enabled in order to take effect."
"The reason for protecting a change by a trace flag is to make sure we do not regress any other cases if the change may affect query plan choice. After some time, when we see stable behavior after the Hotfix, we remove trace flag protection for some fixes and make them active by default. For this fix, it may happen for future Cumulative Updates and the next Service Pack."

**** Update 2021 February 8 ****

I'll bring this additional kb article up from the comments into the main post :-)  Thanks to Aaron Morelli for pointing this kb article out.
KB2589980 - FIX: Incorrect results or constraint violation when you run a SELECT or DML statement that uses the row_number function and a parallel execution plan in SQL Server 2008

As far as I know, the behavior enabled by trace flag 4134 was made default behavior in SQL Server 2012.  Thankfully, after SQL Server 2012 I haven't had to pay attention to it again. :-)

Tuesday, April 22, 2014

SQL Server: Win some, learn some... try a whole buncha

***Update 20140625***
Amit Banerjee (twitter: ) indicated that T345, listed below, no longer applies to current SQL Server builds.  Thanks, Amit!
***End update***

It isn't that I am opposed to query tuning... far from it!  However, I'm in a rather unusual position where changing the SQL text of any given query may require waiting for changes in two separate products from two different vendors.  Then waiting for adoption of those versions.  There are only a few things I'm good at - waiting isn't one of them.  Until then index can be added (or removed), stats and index maintenance strategies can be implemented and modified... but not too much in terms of tuning individual queries lest the customizations make future package upgrades more precarious.

So I do everything I can to tune the underlying hardware/driver/filesystem/OS system.  The idea is to deliver as much reliability and performance capacity as possible, and coax the database into leveraging those attributes as much as possible.

This is why I spend a lot of time thinking and writing about NUMA, memory management, disk IO optimization and the like.

Its also the reason I spend so much time learning about SQL Server trace flags.  Sometimes, the database can be encouraged to make better use of the system performance capacity for my workloads with a trace flag.  That is certainly the case with T8048, which removes a significant bottleneck in stealing query memory when there are multiple queries per NUMA node (or scheduler group) that are stealing lots of query memory.  There are other trace flags that have the effect of 'tuning' sets of queries, all at the same time.  For example, the enhanced join ordering available with trace flag 4101.  That one really helped me out of a jam - I saw some memory grants drop in size from ten GB or more to 1 mb with no other change than adding that trace flag.  (Tested benefits and looked for problems first with querytraceon, then promoted it to instance-wide enabled.)

So this year, here are some trace flags that I'll be evaluating with my workloads.   Not a lot of info about them.  As I test with them in SQL Server 2012 and 2014 I hope to provide some details about what I see - especially if I see no discernible difference at all.

T342
T345   http://support.microsoft.com/kb/625072/it
T2328 http://blogs.msdn.com/b/ianjo/archive/2006/03/28/563419.aspx
T4138 http://support.microsoft.com/kb/2667211
T9082 http://support.microsoft.com/kb/942906

For some information on these and many other trace flags, you can check out this post, and the accompanying pdf.
http://sqlcrossjoin.wordpress.com/2013/10/28/a-topical-collection-of-sql-server-flags/

Here's my disclaimer: trace flags are not to be trifled with.  Test them in isolation if possible first on a nonprod system.  Measure the effects, compare to expected/desired behavior and baseline.  When possible, test in full context (full scale workload at full concurrency) in nonproduction before promoting to production.

Wednesday, April 9, 2014

srarw - sequential read after random write (ZFS, WAFL, reFS)

I like good marketing terms as much as the next guy, but what I really crave are terms and phrases that explain system and storage patterns and phenomena that I become very familiar with, and have a hard time explaining to other folks.

Several years ago, I was in exactly that position - having seen a condition that degraded database performance on a shadow paging filesystem* with HDDs and trying - in vain mostly - to explain the condition and my concern to colleagues.

*OK... a brief aside.  What on earth is a "shadow paging filesystem"?  Most familiar hard drive storage technology is "write-in-place".  Expand a database file by 1 GB, and that 1 GB of database file expansion translates through the OS filesystem, OS logical volume manager, SAN layers, etc to specific disk sectors.  Write database contents to that 1 GB file expansion, and the contents will be written to those disk sectors.  Update every single row, and the updates will take place via writes to those same disk sectors.  That is "write-in-place".
The alternative is a continual "redirect on write", or a shadow paging filesystem.  The initial contents contents get written to 1 GB worth of disk sectors A-Z.  Update every single row, and the updated contents don't get written in place, but rather get written to a new/different 1 GB of disk sectors a'-z'.
Once the new disk writes are complete, updates are made to the inode/pointer structure that stitches together the file(or LUN) presented to the host operating system.  The most common example of this type of continual redirect-on-write strategy is WAFL, used by the ONTAP operating system on NetApp storage.*

The issue was that a database object structure could be written initially completely sequentially, from the standpoint of the database, database server filesystem/LVM, AND the storage system.  However, later updates could occur to a small and scattered sample of the data within that contiguous range.  After the updates (assuming no page splits/migrated rows due to overflow of datbaase page/block boundaries), the data would be contiguous from database and database server filesystem/LVM standpoint.  But, the updated blocks would be rewritten in a new sequential location - making sequential on disk contents that were scattered from the standpoint of database and database server filesystem/LVM.

What's the harm in that?    Consider a full scan of such an object, whether in response to a user query or to fulfill an integrity check.  Before the interior, scattered updates the 1 GB range may very well be read with a minimal number of maximum size read operations at the OS and storage levels, with a minimal amount of disk head movement.  After the scattered internal updates?  The number and size of OS level read commands won't change (because I've stipulated earlier that none of the updates caused page splits/migrated rows).  However, the number and size of commands to retrieve the data from the hard drives to the storage controllers in the array would almost certainly have changed.  And the amount of disk head movement to retrieve the data could also have changed significantly.  What if 6 months of time and accumulated data had accrued between the original, completely sequential write of the data and the later scattered updates?  That could introduce significant head movement and significant wait into the data retrieval.

When I began discussing this phenomena with various engineers, the most common reply was: "yeah, but aren't you most concerned with OLTP performance, anyway?"  At that time in my life, that was completely true... however...
I also knew that a production system with true, all day, pure OLTP workload simply doesn't exist outside of very limited examples.  Integrity checks and backups are the main reason.  Show me a critical production database that operates without backups and integrity checks, and you've shown me a contender for a true all-day, every-day pure OLTP.
Otherwise, the degradation of sequential reads after scattered, internal, small updates is a REAL concern for every database when operating on a shadow paging filesystem.  That's true if the shadow paging filesystem is on the database server host (eg ZFS, of which I am a big fan), or on the storage subsystem (NetApp, or a system using ZFS).

Here's the kicker... someday I think it'll matter for SQL Server on Windows, too regardless of the underlying storage.  Although Microsoft reFS is not a good fit for SQL Server today (I'll come back and add links as to why later), I think future enhancements are likely to bring it into focus for SQL Server.

Finally, I found a name for the performance concern: SRARW.  Decoded: sequential read after random write.  And I found the name in a somewhat unlikely source: a paper written by a NetApp engineer.  In truth, I shouldn't have been all that surprised.  NetApp has a lot of brilliant people working for and with them.
Here's the paper that introduced the term SRARW to me:
Improving throughput for small disk requests with proximal I/O
Jiri Schindler, Sandip Shete, Keith A. Smith

Now... if you are running SQL Server or Oracle on NetApp, I encourage you to keep track of the pace of large sequential operations that always execute with the same number of threads.  If you see a significant slowdown in the pace, consider that SRARW and low level fragmentation may be one of the contributors.  NetApp has jobs that can be scheduled periodically to reallocate data... re-sequence data that has been made "outta order" due to small scattered interior writes.
There is also a NetApp "read reallocate" attribute that should be considered for some workloads and systems.
These items are better described at this location.
http://www.getshifting.com/wiki/reallocate

If you are using ZFS and SRARW performance degrades... unfortunately at this time your options are limited.
 

Friday, April 4, 2014

SQLServer transaction log writes - 64k aligned?



I spend a lot of time thinking about high speed ETL.  I also spend a lot of time thinking about DR solutions and backups.

Below you can read details on how I came to the following question (to which I don't yet know the answer and will update when I do): are SQL Server 60k writes 64k aligned?

*****
Aha!  I don't think I'll have to bust out procmon for this after all.  Just get a Windows striped volume, put my txlog (and only the txlog) on the striped volume, start perfmon monitoring the physical disks/LUNs in the striped volume with perfmon (writes per second, current disk queue depth, write bytes/second) and spin up a workload that pushes the logwriter to as many in-flight 60k writes as possible.

If the average write size on the physical volumes is ~60k and the current queue length is 16 or less - awesome! That would mean striping is keeping each write intact and not spitting it, and that the queue depth on each LUN is lower (so that replication readers, etc have room in the queue depth of 32 to do their stuff without pushing anything into the OS wait queue.)

But if the average write size is ~32k... that would mean that most 60k writes by the log writer are being split into smaller pieces because they are not aligned with the 64k stripes used by the Windows LVM.

I guess even if the writes aren't 64k aligned, Windows striping may still be useful for my scenarios... but would have to stripe 4 LUNs together into a striped volume in order to lower the queue length for burdened log writer activity from 32 (with a single LUN) to 15.

*****

Each SQL Server transaction log can sustain up to 32 concurrent in-flight writes, with each write up to 60k.  To get the fastest ETL, fast transaction log writes at queue length 32 are a necessity.  That means... put such a transaction log on its own Windows drive/mounted partition, since typically the HBA LUN service queue depth is 32.  Put other files on there, too, and the log writer in-flight writes might end up in the OS wait queue.  If writes wait on a full service queue of reads, they'll be ESPECIALLY slow.  There are other ways to make them especially slow - for example to serialize the inflight writes to to a synchronous SAN replication strategy.  Anyhooo...

In massive ETL its not unusual for the transaction log writer to wait on completion of 32 writes, each 60k, and not issue the next write until one of them completes.

Writes are usually to write SAN cache, and should be acked on receipt to write cache.  As such, as long as the write is in the HBA service queue (rather than in OS wait queue), front end port queue depth isn't saturated, front end CPU isn't saturated, and SAN write cache isn't saturated - writes should be doggone fast already. (The overhead of wire time - or 'wait for wire time' - for synchronous SAN replication also shouldn't be overlooked when evaluating write latency.) So what can be done to improve these writes that are already typically pretty fast?

I'm not a fan of using Windows striped volumes for SQL Server data files - there's a fixed 64k stripe size.  That will circumvent large readahead attempts by SQL Server.  But for speeding up transaction log access, striped volumes may be just the thing I need.  (Robert Davis - @sqlsoldier - pointed out that unless there is underlying data protection a Windows striped volume offers no data redundancy or protection. I'm only going down this path because underneath the Windows basic disks, whether in striped Windows volume or not, SAN storage is providing RAID10, RAID5, or RAID-DP protection.)

So... this is where the question of 64k alignment of the 60k writes comes in.

Assume 32 inflight writes at 60k each issued by SQLServer logwriter to a txlog all by itself on a Windows striped volume composed of two equally sized basic disks.  If the writes are not 64k aligned, the sames as the Windows stripes, the write activity passed down through the HBA will break down like the chart below.  Its painful to look at, I know.  Haven't figured out a less confusing way to represent it yet.  Basically, each 64k stripe on either basic disk will contain either a full 60k transaction log write and only 4k of the next transaction log write, or two partial transaction log writes.  All tolled, 32 writes gets broken down into 60 writes!  (By the way, this same idea is why its important to have Windows drives formatted so that their start aligns with expected striping.)
Basic Disk A        Basic Disk B
 1 - 60k 
 2 - 4k              2 - 56k  
 3 - 52k             3 - 8k
 4 - 12k             4 - 48k
 5 - 44k             5 - 16k
 6 - 20k             6 - 40k
 7 - 36k             7 - 24k
 8 - 28k             8 - 32k
 9 - 28k             9 - 32k
10 - 36k            10 - 24k
11 - 20k            11 - 40k
12 - 44k            12 - 16k
13 - 12k            13 - 48k
14 - 52k            14 -  8k
15 - 60k
                    16 - 60k
17 - 60k

18 - 4k             18 - 56k  
19 - 52k            19 - 8k
20 - 12k            20 - 48k
21 - 44k            21 - 16k
22 - 20k            22 - 40k
23 - 36k            23 - 24k
24 - 28k            24 - 32k
25 - 28k            25 - 32k
26 - 36k            26 - 24k
27 - 20k            27 - 40k
28 - 44k            28 - 16k
29 - 12k            29 - 48k
30 - 52k            30 -  8k
31 - 60k

                    32 - 60k


So by striping the txlog, instead of 1 LUN with 32 writes and 1920 write bytes inflight… its 2 LUNs, each with 30 writes & 960 total write bytes outstanding.  50% reduction in write bytes per volume, 6% reduction in concurrent write IOs per LUN (from 32 to 30).

On the other hand, if the writes are 64k aligned, it'd be an even split: 16 writes and 920 write bytes outstanding to each LUN, a 50% reduction in both outstanding writes and outstanding write bytes.

So unless someone knows the answer, I guess we'll be busting out procmon and tracking transaction log write offsets once we crank the workload up to consistently hit 60k writes.  If they are 64k aligned, I'll be happy - I can blog in the near future about Windows striped volumes getting me out of a few jams.  If not... it'll probably be back to the drawing board. 

Thursday, April 3, 2014

Oracle on AIX - cio filesystem for redo logs; demoted IO and vmstat

The 'w' column in the vmstat output below... has something to do with demoted IO.  When Oracle redo logs with default block size of 512 bytes are put on a JFS2 cio mounted filesystem, for example, with the default 4096 byte filesystem block size.

Jaqui Lynch referenced this vmstat option in her March 2014 presentation, page 41.
https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/1cb956e8-4160-4bea-a956-e51490c2b920/attachment/52ed9996-7561-42e8-a446-09f5f1414521/media/VUG-AIXPerformanceTuning-Part2-mar2414.pdf 

But I haven't found an explanation of what the conditions are for the w column to count a thread.  Not yet, anyway.  The second variation below is used in perfpmr memdetails.sh script.

# vmstat -IW

System configuration: lcpu=12 mem=16384MB ent=1.00

   kthr       memory              page              faults              cpu
----------- ----------- ------------------------ ------------ -----------------------
 r  b  p  w   avm   fre  fi  fo  pi  po  fr   sr  in   sy  cs us sy id wa    pc    ec
 1  1  0  0 1835258 1779800   0   2   0   0   0    0  24 1991 1625  0  0 99  0  0.01   1.0


# vmstat -W -h -t -w -I 2 2

System configuration: lcpu=12 mem=16384MB ent=1.00

     kthr              memory                         page                       faults                 cpu                   hypv-page           time
--------------- --------------------- ------------------------------------ ------------------ ----------------------- ------------------------- --------
  r   b   p   w        avm        fre    fi    fo    pi    po    fr     sr    in     sy    cs us sy id wa    pc    ec   hpi  hpit   pmem   loan hr mi se
  2   0   0   0    1824787    1788933     0    15     0     0     0      0    11   2874  1698  3  3 94  0  0.10  10.0     0     0  16.00   0.00 16:34:26
  2   0   0   0    1824802    1788916     0     0     0     0     0      0     6   1793  1680  1  1 98  0  0.05   4.9     0     0  16.00   0.00 16:34:28
 

Tuesday, April 1, 2014

IBMPower AIX: perfpmr pipes "pile" to kdb

The topas utility in AIX has some information, specifically in 'topas -M' that I'd like to be able to log over time.  Namely, the amount of filesystem cache associated with each SRAD (the logical CPUs and memory within an LPAR from the same socket) and the local/near/far dispatch ratio for threads on that logical core.

Check it out on Nigel Griffiths' blog
https://www.ibm.com/developerworks/community/blogs/aixpert/entry/local_near_far_memory_part_2_virtual_machine_cpu_memory_lay_out3?lang=en


Logging topas from an unattended script is painful.  But most of the work I fret over should be monitored unattended.

Nigel mentioned in his post that he's seen information similar to 'topas -M' in perfstat API programming.  He's got a great page on that, linked below.  Very promising, but after investing a significant amount of time today I still wasn't able to get what I wanted.  I'll come back to perfstat in the future, I'm sure.  Especially because if I can write something using the perfstat API, at least it won't need root access like most of the kdb work I end up doing.
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/Roll-Your-Own-Performance-Tool


So I'll take a look at what perfpmr is doing these days that might be relevant.

Saw this, wanted to make sure I jotted it down before it gets lost in the sea of my life.
 
[root@sasquatch_mtn: /root]
# echo pile | kdb
           START              END <name>
0000000000001000 00000000058A0000 start+000FD8
F00000002FF47600 F00000002FFDF9C8 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1000F0A00000000 F1000F0A10000000 pvproc+000000
F1000F0A10000000 F1000F0A18000000 pvthread+000000
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C0208380
(0)> pile
ADDRESS                NAME             cur_total_pages
0xF100010020990800     NLC64            0x00000000000000FC
0xF100010020990900     NLC128           0x0000000000000014
0xF100010020990A00     NLC256           0x0000000000000000
0xF10001002BBD0800     iCache           0x0000000000002080
0xF10001002BBD0900     iCache           0x0000000000002080
0xF10001002BBD0A00     iCache           0x0000000000002080
0xF10001002BBD0B00     iCache           0x0000000000002080
0xF10001002BBD0C00     iCache           0x0000000000002080
0xF10001002BBD0D00     iCache           0x0000000000002080
0xF10001002BBD0E00     iCache           0x0000000000002080
0xF10001002BBD0F00     iCache           0x0000000000002080
0xF10001002BBD0000     iCache           0x0000000000002080
0xF10001002BBD8100     iCache           0x0000000000002080
0xF10001002BBD8200     iCache           0x0000000000002080
0xF10001002BBD8300     iCache           0x0000000000002080
0xF10001002BBD8400     iCache           0x0000000000002080
0xF10001002BBD8500     iCache           0x0000000000002080
0xF10001002BBD8600     iCache           0x0000000000002080
0xF10001002BBD8700     iCache           0x0000000000002080
0xF10001002BBD8800     iCache           0x0000000000002080
0xF10001002BBD8900     iCache           0x0000000000002080
0xF10001002BBD8A00     iCache           0x0000000000002080
0xF10001002BBD8B00     iCache           0x0000000000002080
0xF10001002BBD8C00     iCache           0x0000000000002080
0xF10001002BBD8D00     iCache           0x0000000000002080
0xF10001002BBD8E00     iCache           0x0000000000002080
0xF10001002BBD8F00     iCache           0x0000000000002080
0xF10001002BBD8000     iCache           0x0000000000002080
0xF10001002BCF3100     bmIOBufPile      0x0000000000000000
0xF10001002BCF3200     bmXBufPile       0x0000000000000FEC
0xF10001002BCF3300     j2SnapBufPool    0x0000000000000000
0xF10001002BCF3500     logxFreePile     0x0000000000000004
0xF10001002BCF3600     txLockPile       0x000000000000151C
0xF10001002BCF3700     j2VCBufferPool   0x0000000000000080
0xF10001002BCF3900     j2VCBufferPool   0x0000000000000078
0xF10001002BCF3D00     j2VCBufferPool   0x000000000000007C
0xF100010020990B00     j2VCBufferPool   0x0000000000000078
0xF1000100358AD500     j2VCBufferPool   0x00000000000000EC
0xF1000100358AD600     j2VCBufferPool   0x0000000000000078
0xF1000100358AD700     j2VCBufferPool   0x00000000000000F4
0xF1000100358ADB00     j2VCBufferPool   0x00000000000000F8
0xF1000100E0D28200     j2VCBufferPool   0x0000000000000104
0xF1000100E1BC5500     vmmBufferPool    0x0000000000000800



Yeah... that one I probably won't get to until sometime next year.


# echo "mempsum -psx" | kdb
           START              END <name>
0000000000001000 00000000058A0000 start+000FD8
F00000002FF47600 F00000002FFDF9C8 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1000F0A00000000 F1000F0A10000000 pvproc+000000
F1000F0A10000000 F1000F0A18000000 pvthread+000000
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C0208380
(0)> mempsum -psx
MEMP VMP SRAD PSZ NB_PAGES  MEMP%   SYS% LRUPAGES   NUMFRB    NRSVD  PERM%

 000  00   0  ---    7.5GB ------  50.0%    7.5GB    3.5GB    0.0MB  11.3%
 001  00   0  ---    7.5GB ------  49.9%    7.5GB    3.5GB    0.0MB  11.3%  


# echo "lrustate 1" | kdb
           START              END <name>
0000000000001000 00000000058A0000 start+000FD8
F00000002FF47600 F00000002FFDF9C8 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1000F0A00000000 F1000F0A10000000 pvproc+000000
F1000F0A10000000 F1000F0A18000000 pvthread+000000
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C0208380
(0)> lrustate 1

LRU State @F1000F0009540840 for mempool 1
> LFBLRU
> not fileonly mode
> first call to vcs (lru_firstvcs)
LRU Start nfr        (lru_start)        : 0000000000000000
mempools first nfr   (lru_firstnfr)     : 0000000000000000
numfrb this mempool  (lru_numfrb)       : 0000000000000000, 0
number of steals     (lru_steals)       : 0000000000000000, 0
page goal to steal   (lru_goal)         : 0000000000000000, 0
npages scanned       (lru_nbscan)       : 0000000000000000, 0
addr, head of cur pass list   (lru_hdr) : 0000000000000000
addr, head of alt pass list  (lru_hdrx) : 0000000000000000
current lru list          (lru_curlist) : 0000000000000000
current lru list object    (lru_curobj) : 0000000000000000
pgs togo p1 cur obj        (lru_p1_pgs) : 0000000000000000, 0
pages left this chunklet (lru_chunklet) : 0000000000000000, 0
scans of start nfr (lru_scan_start_cnt) : 00000000
lru revolutions      (lru_rev)          : 00000000
fault color          (lru_fault_col)    : 00000000, 0 BLUE
nbuckets scanned     (lru_nbucket)      : 00000000
lru mode             (lru_mode)         : 00000000 DEF_MODE
request type         (lru_rq)           : 00000000 LRU_NONE
list type to use     (lru_listidx)      : 00000000 WORKING
page size to find    (lru_psx)          : 0000
MPSS fs'es to skip   (lru_mpss_skip)    : 00000000
MPSS fs'es failed    (lru_mpss_fail)    : 00000000
numperm_global     (lru_numperm_global) : 00000000
global numperm%    (lru_global_numperm) : 00000000  0.0%
perm frames (lru_global_perm_n4kframes) : 0000000000000000
lruable frames   (lru_global_n4kframes) : 0000000000000000
16m mpss type        (lru_16m_type)     : 00 LRU16_IDLE
16m mpss seqn        (lru_16m_seqn)     : 00000000