Sunday, February 10, 2013

SQL.Sasquatch, SQL_Sasquatch, SQL-Sasquatch

I don't like hyphens.  In handles, names, URLs I'd much rather have periods or underscores.  Alas, blogger will only let me have a hyphen in my blogspot URL.  So, SQL-Sasquatch.blogspot.com it is.

I am square peg in a round hole, maybe a bit of a bull in a china shop. 

The SQL queries I write often give me away as a novice: I'm not from around here.

I built most of my expertise working with a non-relational database on UNIX platforms.  And I paid a lot of attention to memory management, processor scheduling, and storage IO.  I've seen long-forgotten or ignored memory management or processor scheduling settings be the key to unlocking the performance potential of systems.  I've seen crippling effects of storage bottlenecks caused by insufficient IO buffers or request slots, imbalanced storage utilization, writes-blocking-reads, copy-on-write overhead, filesystem fragmentation, file fragmentation, database object fragmentation, cache saturation, back end port/bus saturation, front end port saturation, RAID 5 random write overhead, replication overhead, and a few others that I will only remember (if I'm lucky) the next time I encounter them.

I'm glad when I hear people talk about capturing a baseline for future performance and resource utilization evaluation.  But few recognize the value in challenging the assumption that the system was optimally configured when the baseline was captured.  Some system limits only cause havoc after a certain resource utilization or concurrency threshold is crossed.  Identifying pre-existing latent performance or scalability limitations can be really hard.  Often the most obvious performance opportunity is purported to be the most significant performance opportunity.  Sometimes this leads to a never-ending engagement of code tuning in order to achieve healthy performance.  Sometimes it leads to provisioning additional system resources (throwing money at it).  In some cases, intense investigation and some system tuning could give enough back that code and system provisioning could be reviewed without the stress of an ongoing crisis to provide additional benefits in the future.

One of the reasons I'm valuable to have around is that I'll challenge the assumptions of optimal configuration at baseline, and that configuration or activity changes since baseline are the most significant factor in performance changes.  Not necessarily with the finesse you'd expect out of someone named sql_sorcier, for example.  Maybe only fittingly for someone named sql_sasquatch.

2 comments:

  1. Hi

    I am new to replication and wonder what is replication overhead. Where are the performance issues are introduced?
    by the log reader reads?
    by distribution database writes?
    by subscribers/

    ReplyDelete
  2. Hello obulay! Replication of a database can take place at several different levels - database, filesystem/LVM (IBM LVM mirroring is the one I've worked with most at the LVM layer), storage. On each of those three levels there is the possibility of synchronous or asynchronous replication. I like to start negotiations with asynchronous database replication, and ask for solid reasons to vary :)

    Synchronous replication has the most significant potential for performance impact, because even if the local and remote systems can both absorb replication activity resource utilization without breaking a sweat, the round trip wire time between systems and acknowledgment necessary for synchronous replication can impose significant additional latency to every write operation on the local system.

    Although asynchronous replication can avoid some of the performance impact associated with synchronous replication, there's still a lot of activity involved. And asynchronous replication ALWAYS implies some maximum tolerance for latency between the systems - most typically this is represented by the finite capacity of a log location on the local system. Once that tolerance is reached, either the replication relationship needs to be severed, or activity on the local system may be suspended or transition into something similar to the one-in, one-out scenario that comes along with cache saturation.

    Anyway... beyond the basics, it requires details of at least the local system, remote system, connecting network, and replication mechanism to determine where replication activity and resource utilization could present itself as overhead leading to performance impact.

    ReplyDelete