I'd like to thank everyone for their replies and advice. I'm sorry it took so long for me to respond; I took a long weekend after a long shift. Some remaining questions can be found in the final section of this posting. The summary (I hope i have all of this correct): Problem: A DOS box (client) connects to a Linux box (server) using the same local port (1025) on the client each time. The client sends data which the server reads; the server is passive and does not write any data. If the client crashes and fails to properly close the connection, under CentOS 6.5, the unclosed listener on the server receives a 0-length recv(), allowing for a "clean" reconnect; under 6.6, it does not, and the client unsuccessfully retries the reconnect endlessly. Diagnosis: Because the client is connecting using the same port every time, the server sees the same 5-tuple each time. At that point, the reconnection should fail until the old socket on the server is closed, and the previous behavior of receiving a 0-length recv() on the old server socket is unsupported and unreliable. Until the update to CentOS 6.6 'broke' the existing functionality, I had never looked deeply into the connection between the client and the server; it 'just worked', so I left it alone. Once it did break, I realized that because the client was connecting on the same port every time, the whole setup might have been relying on unsupported behavior. My workaround: I unfortunately had to implement an emergency workaround before receiving any replies. Fortunately, the client also sends status messages to the same computer (but a different server program) over a serial-port side-channel (well, it's more complicated than that, but anyway). I set up a listener for a "failed connection" status message which signal()s the server program to close all client connections (but not the bound dispatchers) and thereby force all clients to reconnect. It's a cheat and a cheesy hack, but it works. Other diagnostics: One test I intend to run in a couple of weeks (next opportunity) is to boot the CentOS 6.6 box with the older kernel, in order to find out whether the behavior change is in the kernel or in the libraries. Correct solutions: 1) Client port: The client should be connecting on a random, ephemeral port like a good client instead of on a fixed port, which I suspected. I don't know if this can be changed (due to a really dumb binary TCP driver). 2) Protocol change: The server never writes to the socket in the existing protocol, and can therefore never find out that the connection is dead. Writing to the socket would reveal this. But what happens if the server writes to the socket, and the client never reads? (We do, as it happens, have access to the client software, so the protocol can be fixed eventually. But I'm still curious as to the answer.) 3) Several people suggested using SO_REUSEADDR and/or an SO_LINGER of zero to drop the socket out of TIME_WAIT, but does the socket enter TIME_WAIT as soon as the client crashes? I didn't think so, but I may be wrong. 4) Several people suggested SO_KEEPALIVE, but those occur only after hours unless you change kernel parameters via procfs and/or sysctl, and when the client crashes, I need recovery right away, not hours down the road. Time here is literally worth a dollar per second, roughly. Anyway, thanks for the discusssion and helpful links. At one time I knew all this stuff, but it has been 20 years since I had to dig into the TCP protocol this deeply. -G. -- Glenn Eychaner (geychaner@xxxxxx) Telescope Systems Programmer, Las Campanas Observatory _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos