Thursday, January 19, 2017

SQL Server, Dead Connection Detection & Windows Registry Setting TcpMaxDataRetransmissions - What's the real deal?

Dead Connection Detection is an important facet of SQL Server operational database support.  If the client side of a database connection is dead-and-gone, the server side is consuming an ephemeral port, may have an open transaction (and thus be preventing transaction log truncation), may be holding database locks (causing blocking), etc.  The sooner the connection is verified as dead and the database thread cleaned up, the sooner orphaned resources can be dealt with.  On the other hand, if dead connection testing is too aggressive, a sluggish client may not respond to keepalive probes in time and its connection may be closed.  This is a special consideration now for virtual servers involved in batch work.  The nature of their workload, and the "urge to converge" can lead to VMs that are CPU-saturated for short periods of time - aggressive dead connection detection could cause what seem to be network-related failures in these cases.  

TCP supplies a mechanism to handle Dead Connection Detection.  When a TCP port is opened, keepidle functionality can be enabled.  The relevant parameters are the keepidle time, the keepinterval, and the retries. (The names attached to these parameters vary a bit with platform.)  When enabled on a port, the keepidle timer counts down time on an idle connection.  Note - any traffic on the port - even the keepidle probes - resets the timer.  Once the timer reaches zero, a keepidle probe is sent.  If the keepidle probe isn't acknowledged by the other side of the connection within the time specified by keepinterval, another probe is sent.  The number of retries for the keepalive probes is also configurable on many platforms.  If any keepalive probe is ack'd by the other side, the keepalive timer resets (and the remaining probes for that round aren't sent).  If the last probe is sent and not ack'd within the keepinterval, in most cases the connection will be terminated.  In the case of SQL Server, that allows the session ID attached to the port to be terminated and cleaned up as well.  (Here's an important parenthetical: typically, the max number of keepalive probe retries and the max number of retransmits for any TCP packet is controlled by the same parameter.  Keepalive probes use a fixed interval, normal retransmits will double each previous interval until reaching the maximum.)       

The following blog post from 2010 explains Windows registry values KeepAliveTime, KeepAliveInterval, and TCPMaxDataRetransmissions and their roles in dead connection detection.

Things that you may want to know about TCP Keepalives
https://blogs.technet.microsoft.com/nettracer/2010/06/03/things-that-you-may-want-to-know-about-tcp-keepalives/

In SQL Server 2005, a valuable addition was made.  SQL Server would use a "Keep Alive" value specified specifically per instance, rather than the KeepAliveTime specified by the Windows registry.  It defaulted to 30000 ms/30 seconds.

The blog post below explains this new addition, and mentions that the interval for SQL Server will be a fixed 1000 ms/1 second, regardless of the KeepAliveInterval specified in the Windows registry. At the time SQL Server 2005 was introduced, TCPMaxDataRetransmissions from the Windows registries still controlled the maximum number of probes.

Understand special TCP/IP property "Keep Alive" in SQL Server 2005
https://blogs.msdn.microsoft.com/sql_protocols/2006/03/08/understand-special-tcpip-property-keep-alive-in-sql-server-2005/

But!!

Let's look at the following piece of developer-facing documentation.

SIO_KEEPALIVE_VALS control code

"On Windows Vista and later, the number of keep-alive probes (data retransmissions) is set to 10 and cannot be changed."

Huh.  On Windows Vista and later - so that would include Windows Server 2008, Windows Server 2008 R2, Windows Server 2012, Windows Server 2012 R2, and Windows Server 2016.

Its true.  Even though there are lots of places out there that still discuss setting TcpMaxDataRetransmissions in the Windows registry - its for naught.  See a recommendation to set it to 3 from its past default of 5?  Nah.  On Windows Vista and later, you'll get 10 as a max.

In a blog post coming soon, we'll look at TCP protocol properties "Keep Alive" and "Keep Alive Interval" for SQL Native Client 11.0.  I'll also discuss testing methodology, and implications.

*****

Here's a round-up of various documentation that discusses setting TcpMaxDataRetransmissions, without mentioning that from Windows Vista onward this is hardcoded at 10 and cannot be changed.


TechNet Library Windows Server > Windows Server 2008 R2 and Windows Server 2008 > Secure Windows Server
> Threats and Countermeasures Guide: Security Settings in Windows Server 2008 and Windows Vista
> Additional Registry Settings

This entry recommends changing TcpMaxDataRetransmissions from 5 to 3, for Windows Vista.

This page mentions TCPMaxDataRetransmissions for Windows Compact Embedded 2013.

This page about accessing SQL Azure database recommends setting registry value TCPMaxDataRetransmissions to 10, which is apparently its hardcoded value.

This kb article describes changing the TcpMaxDataRetransmissions registry value for Windows 7, Windows 2008 R2, Windows 2008, and Windows Vista.

This 2015 post describes setting TcpMaxDataRetransmissions registry value as a troubleshooting measure.

No comments:

Post a Comment