Wednesday, May 28, 2014

SQL Server backups - HBA maxtransfersize; EMC VNX write-aside; backup target isolation

The SQL Server T-SQL backup command allows setting the maxtransfersize parameter to determine the largest SQL Server host side disk IO.  By default, a 1 mb maxtransfersize is used.

The fibre channel HBA also has a max transfer size - default on Windows is 512kb.  Emulex allows max transfer size to be up to 4 mb on Windows in latest driver releases, QLogic allows up to 2 mb. In general, I recommend increasing the HBA max transfer size to at least 1 mb... why let SQL Server *want* to send larger IOs, only to be denied by the HBA :-)

So, let's say the system is using an Emulex HBA, and the backup source database files and the backup targets are on LUNs served up from an EMC VNX.

I generally recommend two SQL Server backup target LUNs on EMC VNX or CLARiiON SANs, with one target LUN on each storage controller to balance research utilization.  In most cases, you'll want to use backup compression and I generally recommend one or two times the number of data files from the source database as the number of target files, with an equal number on each target LUN.

In most cases, though, I think SQL Server sites are using a single large LUN for a given SQL Server instance's backup targets, even if that LUN contains multiple files used as backup targets.

Now for the fun part.  CLARiiON and VNX have a limited amount of write cache on each controller.  Write cache destaging is controlled by configurable low and high write-pending watermarks.  Exceed the high watermark, and write cache destaging will become more aggressive until the write-pending is back down to the low watermark.  During more aggressive write cache destaging, reads have their priority lowered and read latency increases.

During extremely write intensive workloads, write pending levels can easily rise... then read latencies rise.  Cause forced flushes of a given controller's write cache, and all tenants of the controller will notice the latency increase until the flush has completed.

Backups are write intensive.  Full backup of a 1 TB data footprint SQL Server database, even with backup compression, can easily write out 300 GB or more to the target.

If the backup reads are well tuned (enough LUNs for the source database files, high enough queue depth, high enough BUFFERCOUNT parameter value), the backup writes will put an awful lot of write pressure on the target LUNs.

EMC has thought about that ahead of time.  The CLARiiON and the VNX allow configuring a write-aside value. Well... unless you are using virtual provisioning.  If using virtual provisioning, the write-aside default of 1 mb cannot be overridden.

When the write-aside is in effect, writes larger than that threshold bypass write cache and are written directly to the storage devices.

There's at least one good reason to want to do that: if there are other tenants of the SAN, by keeping the write cache free of the huge writes coming from the backup, there is less chance of dragging down performance for other applications.

Optimally, the backup write targets will be located on a set of physical disks separate from the source databases... and ideally separate from other activity at that time.  That would allow noncached writes to go to the hard disks with very little head movement - very high transfer rate expected.  In fact, I'd recommend this when possible even if the writes are cached... because without any other competition the cached writes can destage to these disks quite quickly.  And... later when its time to pull the backup off of those disks to get it into TSM, or NetBackup or ???, if there's no other activity the read transfer rate should be pretty high, too because of the minimized disk head movement.

So... for optimal SQL Server backups, a few fundamental things to think about before getting to the SQL Server level:  HBA maxtransfersize, array write cache effect of the backup, and physical disk isolation for the backup targets.

    

No comments:

Post a Comment