Overview of KeepAlive
Any discussion of so-called "keep-alive" functionality must start by answering
the question: "What is does 'keep-alive' mean?" As one specification
succintly states:
A "keep-alive" mechanism periodically probes the other end of a connection when the connection is otherwise idle, even when there is no data to be sent.Keepalive mechanisms appear in many different protocols, under various names. Protocols may be layered over the top of another protocol; for example, HTTPS consists of HTTP layered over SSL/TLS, itself layered over TCP/IP. Each protocol layer may have its own form of such keepalive functionality.
TCP KeepAlive
For TCP, the definitive specification for keepalive functionality is
RFC 1122, Section While RFC 1122 includes a good discussion of why TCP keepalives are
meant to be off by default, in practice TCP keepalives do have value,
especially when dealing with network equipment (such as routers, firewalls,
NATs) between the client and the server; such equipment might terminate the
connection when the connection has been idle for too long. The use of TCP
keepalives can help to prevent such network equipment from breaking the
connection needlessly.
How TCP KeepAlives Works
OK, so you want to use the TCP keepalive functionality in your program. The
question is "How exactly does the TCP keepalive feature work?" Good question.
Answering this requires three different numeric values: the idle time,
the number of probes to send (the probe count), and the interval
time between each probe. Remember, though, that the
TCP socket option must be enabled on the socket
in order for the TCP keepalive feature to be used.
First, let's assume that you have created a TCP connection, and have transferred data back and forth on that connection. All of the data have been transferred, but you have not closed the connection, so now it is idle. How long does that connection sit idle, with no data transferred at the TCP layer, before one end of the connection or the other starts to wonder whether the connection is still alive? This amount of time where the connection sits idle is the TCP keepalive idle time; the default idle time is two hours (per RFC 1122).
Our TCP connection has been sitting idle now for amount of time given by the idle time value; what happens then? At this point, the end of the TCP connection with TCP keepalive enabled sends out a "probe". This probe is just a small TCP packet which requires a response from the other side. Once the probe has been sent, the amount of time given by the interval time value passes. If we hear nothing back from the remote peer within the interval time, we send another probe. This process repeats until either a) we receive a response back from the peer, or b) the probe count value has been reached.
Let us assume that our TCP connection was idle for so long that TCP keepalive probes were sent, and still no response was received. What happens then? At this point, the connection is broken. When the programs at either end of the connection next try to read or write data on that connection, the read/write attempts will fail.
When To Use TCP KeepAlive
When the TCP client is connected directly to the TCP server, it usually does
not matter whether one end or the other uses TCP keepalive. As long as
one of them does, a broken connection can be detected.
client <-------------------------------> serverHowever, if the TCP client connects to the TCP server via proxies/routers/firewalls/NAT, the picture changes. When this happens (and it is the common scenario), then both sides may need to use TCP keepalive to learn when their side of the proxied connection is broken:
client <-----------> NAT <-------------> serverIn this situation, there are actually two different TCP connections involved: between the client and the NAT, and between the NAT and the server. Each TCP connection may break independently of the other, which is why both ends of the connection (client and server) may need to use TCP keepalives. Use of TCP keepalives also helps here because when the router/firewall/NAT receives the TCP keepalive probe, it may (depending on the network equipment in question) cause the router to reset any timers that were about to close the TCP connections on either side.
Why are TCP KeepAlives Useful for FTP?
"This is all very fascinating", you say, "but what does it have to do with
and FTP?" If you have ever had an FTP download (or upload)
take a very long time, only to have that transfer timed out in the
middle, then TCP keepalives may prevent the timeout.
Consider what happens for FTP transfers which take a long time (either due to very large file(s) being transferred, or a slow connection): you have one TCP connection for the control connection, and a separate TCP connection for the data transfer connection. All of the bytes are being transferred over the data connection, so that data connection is certainly not idle -- but while the data transfer is occurring, the control connection is idle! And let's assume that your FTP connections are going through some NAT device in between the client and the server. That NAT may not be very smart; it may not know that the two different TCP connections of your FTP session are related to each other; it only sees one idle TCP connection, and one busy TCP connection. If that FTP control connection is idle for too long, then the NAT may close it (in order to keep valuable space in its state tables available for TCP connections that actually need to transfer bytes). (Some NATs have been known to close TCP connections that have been idle for only 5 minutes.) The FTP server sees that the FTP control connection is closed, and aborts the data transfer. What a mess!
If either the FTP server or the FTP client had used TCP keepalives on the control connection, then maybe that NAT would have seen the TCP keepalive probes, and not closed the idle control connection. So how can we make sure that either the client or the server has TCP keepalives enabled?
In proftpd-1.3.5rc1
and later, ProFTPD's SocketOptions
directive supports a
keepalive parameter for controlling whether the server uses TCP
keepalives, e.g.:
# Disable use of TCP keepalives SocketOptions keepalive off # Enable use of TCP keepalives (this is the default) SocketOptions keepalive onIn addition, on some Unix platforms, the
directive's keepalive parameter can do finer-grained tuning of the
TCP keepalive values:
# Enable use of TCP keepalives, with the given idle/count/interval values SocketOptions keepalive 7200:9:75In general, though, you should use the system-wide defaults unless you are running into data transfer timeout issues. If you are seeing timeouts, try using the keepalive parameter of
gradually reduce the idle timeout by small increments (e.g. 10-15
seconds), then if that does not help, increment the count by 1 at a
time (remember that each probe is more extra data transfer), then if that
still does not help, increase the interval time. Do not reduce the
interval time, since that is the amount of time that you should wait to
see if the other end responds, before sending another probe. Waiting less
time before the other end responds means a greater chance of killing your TCP
connection unnecessarily.
Not all TCP stacks let the application control the TCP keepalive timeout after which the first probe will be sent, or the total number of probes sent, or how much time between probes will be used. That is, many TCP stacks only allow enabling/disabling of TCP keepalive. If TCP keepalive is enabled, then the standard values of 2 hours for the idle timeout, a count of 9 probes, with 75 seconds between probes, will be used.
Since many platforms do not allow fine-grained tuning of TCP keepalive values, especially on a per-service basis, other means for checking whether the connection is still alive must be used. And that leads us to application-level keepalive mechanisms.
How FTP KeepAlive Works
Since the FTP server cannot do anything to test whether the FTP session is
alive, the FTP client must do the tests. The easiest way to test whether an
FTP session is alive is to send an FTP command. And fortunately,
RFC 959, Section 4.1.3
defines the NOOP
("No Operation") command whose sole purpose is
to elicit the "OK" response from the FTP server. This makes the
command the ideal way to test whether the FTP server is
still alive and listening to the FTP client.
Some firewalls/routers know about this NOOP
trick, though, and
may filter out/drop that FTP command. FTP clients, then, have been known to
resort to a number of other FTP commands for use as FTP keepalives, including:
Sadly, some FTP servers cannot handle receiving an FTP command on the control
connection while they are in the middle of transferring data on the data
connection (proftpd
can handle this). But that may not
matter, for the purposes of FTP keepalives; all that matters is that at the
TCP level, the bytes were sent by the client and acknowledged by the server's
TCP stack.
FTP Client-Specific KeepAlive Settings
Not every FTP client supports the FTP keepalive functionality. If you want
to try out FTP clients which do support FTP keepalives, you might look
into the ftp:nop-interval
setting for
, or the
setting for
In the case of ProFTPD's mod_sftp
module, the way to configure
SSH2 keepalives is the SFTPClientAlive
directive. When configured, the
module sends CHANNEL_REQUEST
messages for "keepalive@proftpd.org" in order to solicit a response
from the connected client.
KeepAlive in Other Protocols
Many application protocols end up reinventing the keepalive feature in some
way, usually as a "ping/pong" mechanism where a "ping" is sent every so often
by one side, with a "pong" response expected from the other end of the
Most HTTP connections have no need of a keepalive mechanism since HTTP
connections are usually short-lived, and since there are usually data flowing
in one direction or the other on the HTTP connection (thus an HTTP connection
is usually not idle for long enough time to warrant a keepalive feature).
HTTP long polling (i.e.
RFC 6202) is an exception;
and for HTTP long polling connections, use of TCP keepalives may be needed.
But the HTTP protocol itself does not specifically define a way for either
end to arbitrarily send data across the connection for the purpose of
determining whether the connection is still alive. (HTTP keepalive refers
to a different concept, i.e. that of telling the server to not
close the connection after sending its response so that the connection can
be reused, thus "kept alive".)
For long-lived LDAP connections, keepalive functionality can be implemented
by using the Abandon operation, as described
here. The idea is to have the client send a request that it knows the
server will ignore/discard; the act of transmitting the request over the
connection acts to keep any intermediaries on the network (router/NAT/firewall)
from closing an "idle" connection prematurely.
The WebSocket protocol, defined in
RFC 6455, does have need
of a keepalive mechanism, since it establishes a long-lived connection.
Thus does the RFC define ping/pong messages; see Section 5.5.2
), and Section 5.5.3 (PONG
Additional Reading
<------------ SSH2 REQUEST -----------> -------------- FTP NOOP --------------> client <---------------> NAT <---------------> server <-- TCP probe --> <-- TCP probe -->
Question: Why should I use SocketOptions
configure proftpd
's TCP keepalive settings, as opposed to
changing some of the sysctls on my machine that apply to TCP KeepAlives?
Answer: Using the SocketOptions
means that your TCP keepalive tunings will only affect the FTP/SFTP connections
to proftpd
, instead of applying to all TCP connections to
your machine. FTP/SFTP sessions, being long-lived, probably have different
keepalive timing needs than from other TCP connections, so it is best to tune
their settings separately, without impacting all connections to that machine.
Question: If TCP keepalives are so useful, why not
tune them to be quite short? Why does RFC 1122 recommend a default of
two hours before checking if the peer is still there?
Answer: There are a couple of reasons why you might
not want to tune your TCP keepalive settings to be shorter.
First, keep in mind that TCP was designed to keep the TCP level connection "alive" even though various parts of the underlying hardware, such as routers, firewalls, etc might crash and be rebooted in the middle of things. This is why TCP retransmits lost packets, routes around unresponsive hops, etc. If, then, your TCP keepalive settings are aggressively short, your TCP connection may be shut down (due to not responding to TCP keepalive probes in time) because of a crashed network component. The recommended default of two hours allows for such network route changes without losing the TCP connection.
Second, every TCP keepalive probe counts as more data transferred over your network link. Some links may have exorbitantly high rates for transferred bytes (think satellite links), so on such links, keeping the number of bytes transferred down amounts to cost savings. Tuning TCP keepalive to use shorter times on such links means more data transferred, thus higher costs. On other links, where data transfer rates are cheap, the additional bytes transferred due to shorter TCP keepalive settings may be negligible.
Last, many people argue that use of TCP keepalives may consume unnecessary bandwidth. The question becomes "If no one is using the connection, who cares if the connection is still good?" (To be fair, this argument makes more sense when the connected peers are not separated by proxies, gateways, and NATs.)