On Fri, Sep 14, 2007 at 07:48:45AM -0400, Keith Moore wrote: > [sorry, lost attribution here] > > TCP protects you from lots of stuff, but it doesn't really let you > > recover from the remote endpoint rebooting, for example... > well, duh. if the endpoint fails then all of the application-level > state goes away. TCP can't be responsible for recovering from the loss > of higher-level state. but we're not talking about endpoint failures, > we're talking about the failure of the network. TCP is supposed to > recover from transient network failures. it wasn't designed to cope > with endpoint address changes, of course, because the network as > designed wasn't expected to fail in that way. When I was first learning about networking back in the mid-1980s, I worked on a project involving mobile hosts. The hosts were permitted to change their IP addresses, but TCP-level connectivity needed to remain intact. The loss of a route to some network (or host within that network) might trigger an ICMP unreachable, but the applications (e.g. telnet, ftp) needed to be rewritten not to close in such a situation. It seemed like a reasonable thing to do to treat something like a net or host unreachable as a transient condition, and allow the application to proceed as if nothing serious had happened. When routing connectivity could be restored quickly, the maintained state at both ends of the TCP connection would allow the application to proceed normally. However, this practice doesn't seem to have made it into the application-writing community at large, because lots of applications fail for just this reason. I wonder if even writing a BCP about this even makes sense at this point, because the application writers (or authors of the references the application writers use) may never see the draft, or even be concerned that it's something they should check for. > > (And something that's common in today's IPv4 deployments: NAT > > timeouts. I got bitten by that in Chicago, I think they were only a > > few minutes in my hotel, drove me insane because anything other than > > HTTP didn't work for long.) > given that NATs violate the most fundamental assumption behind IP (that > an address means the same thing everywhere in the network), it's hardly > surprising that they break TCP. After installing a NAT firewall/router, I noticed my ssh connections would drop when left idle for awhile. That never happened before -- I could go away from my machine for hours, and as long as client and server machines were up, with no network dynamics, everything would work fine when I returned. But is it TCP itself that's failing, or ssh interpreting the timeout as a non-transient condition, and telling TCP to close? I think a reasonable compromise for application writers who are concerned about allocating resources to connections that might really need to close (e.g. because the remote end really did crash, or there was a really long timeout), is to allow the user to specify the behavior for the application to take when a level 3 error condition occurs. --gregbo _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf