Re: Protocol for TCP heartbeats?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ted,

The obvious problem is that heartbeats can thus sit in transmit buffer waiting to be delivered. They can even be retransmitted. Etc. In any case the functionality they are supposed to provide is pretty heavily distorted.
FWIW, I don't think it matters if the keepalives are stuck in a TCP
transmit buffer or in a multi-continent routing loop.  If the
application needs to hear from its peer every N seconds and it doesn't,
they're disconnected.
Yes. That's true for dumb keepalive algorithm as described in the previous email.

However, if you want something more sensible (presumably something like SCTP's heartbeats) you need to take current RTO into account. That's something you can't do on top of TCP.

You're missing my point, perhaps because I'm being unclear.  Let me try
it this way:

If an application needs a heartbeat, it almost always needs to be an
application to application (layer 7 to layer 7) heartbeat.

Imagine you have a perfect TCP heartbeat algorithm: detects the
existence of a running TCP instance on both endendpoints perfectly
accurately at the timescales you care about.  Now one of your
application endpoints deadlocks itself - it hangs, spins, whatever - but
the process is alive and the TCP connections are open.  The application
is not responding.  The TCP timeout won't help you at all; the TCP
connection is fine.

Of course the way to deal with that is a layer 7 heartbeat.

My point is that if you need that layer 7 heartbeat, the layer 4 (TCP)
one doesn't help much.  I can't think of an application that needs the
TCP heartbeat and not the application heartbeat.  (There probably is
one; my point is that needing both is the common case.)

Right. Layer 7 heartbeats are definitely needed to detect the whether application is hung up.

However, detecting application hangup is a problem orthogonal to detecting the unavailability of network peer. Being able to detect network unavialability is valuable in itself (i.e. application can start failover procedure in a timely fashion).

Those that need hangup detection can obviously implement heartbeats on layer 7, but that's beside the point here.

So, TCP designers could create a highly parameterized heartbeat timer
(every application has its own idea what a timeout is) and put all that
complexity into the TCP protocol.

No complexity is needed IMO. Consider the following:

1. The keepalives are already defined in rfc 1122 (4.2.3.6)

2. There are no interoperability issues. With SCTP-like heartbeat mechanism each peer manages its failure detection mechanism itself and no extra effort on behalf of the other side is needed. Thus implementations with failure detection would work perfectly well with implementations with no failure detection.

3. There are no congestion control issues. The keepalives are data and thus they should adhere to TCP congestion control mechanism. When the peer is unreachable keepalives would back off in a decent manner.

AFAICS the only thing preventing specification of optional TCP heartbeat mechanism are the artificial restrictions in rfc 1122 4.2.3.6, such as "no less than two hours" rule.

It's interesting to look at the rationale in rfc 1122:

"The TCP specification does not include a keep-alive mechanism because it could: (1) cause perfectly good connections to break during transient Internet failures; (2) consume unnecessary bandwidth ("if no one is using the connection, who cares if it is still good?"); and (3) cost money for an Internet path that charges for packets."

(1) Is exactly what you require for high-availability solutions.
(2) High-availability solution does care.
(3) True, but those that need it are happy to pay the extra cost.

Martin
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf


[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]