Re: Socket behavior change from 6.5 to 6.6

Gordon Messmer <gordon.messmer@xxxxxxxxx> · Wed, 21 Jan 2015 10:09:55 -0800

On 01/21/2015 08:49 AM, Glenn Eychaner wrote:
Diagnosis:
the previous behavior of
receiving a 0-length recv() on the old server socket is unsupported and
unreliable.
You mention that a lot, and it might help to understand why that happens.

A 0 length recv() on a standard (blocking) socket indicates end-of-file. 
 The remote side has closed the connection.
What you were previously seeing was the client sending SYN to establish 
a new connection.  Because it was unrelated to the existing connection 
on the same 5-tuple, the server's TCP stack closed the existing socket. 
 I'm not positive, but the server may have sent a keepalive or other 
probe to the client and got a RST.  Either way, the kernel determined 
that the socket had been closed by the client, and a 0-length read 
(recv) is the way that the kernel informs an application of that closure.
Until the update to CentOS 6.6 'broke' the existing functionality,
I had never looked deeply into the connection between the client and the
server; it 'just worked', so I left it alone. Once it did break, I realized
that because the client was connecting on the same port every time, the
whole setup might have been relying on unsupported behavior.
Not just unsupported, but incorrect.  Unrelated packets with a 5-tuple 
matching an established socket are typically injection attacks.  TCP is 
supposed to discard them.
Other diagnostics:
One test I intend to run in a couple of weeks (next opportunity) is to boot
the CentOS 6.6 box with the older kernel, in order to find out whether the
behavior change is in the kernel or in the libraries.
It's always good to test, but it's almost certainly the kernel. 
Libraries don't decide whether or not a socket has closed, which is what 
the 0-length read (recv) indicates.
Correct solutions:
1) Client port: The client should be connecting on a random, ephemeral port
Yes.

2) Protocol change: The server never writes to the socket in the existing
protocol, and can therefore never find out that the connection is dead.
Writing to the socket would reveal this. But what happens if the server writes
to the socket, and the client never reads?
You will eventually fill up a buffer on one side or the other, and at 
that point any further write (send) will block forever.
3) Several people suggested using SO_REUSEADDR and/or an SO_LINGER of zero to
drop the socket out of TIME_WAIT, but does the socket enter TIME_WAIT as soon
as the client crashes? I didn't think so, but I may be wrong.
No.  It enters TIME_WAIT when the socket closes.  If the socket were 
closing, you'd be getting a 0-length read (recv).  You can confirm that 
with "netstat"
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos