On Thu, Apr 07 2016, Richard Laager wrote: > > In a separate failover event, I tested accessing NFS over TCP. I do > *not* get "Received RST segment.". So I conclude that > tcp_validate_incoming() is not being called. Thanks for all the details. The ssh experiment quite convincingly shows that the network infrastructure is working correctly. The NFS experiment is strange - the RST doesn't even seem to be arriving. Yet the tcpdump shows that it did. > > Any thoughts on what that means or where to go from here? Working back from tcp_validate_incoming, it is called from two places. One is tcp_rcv_state_process() which handles connections which are not currently established, so it should be irrelevant. The other is tcp_rcv_stablished(). As the RST flag is set the fast-path branch will not be taken (as ->pred_flags cannot possibly contain RST) so it should reach the slow_path: label. The only things that can stop the code reaching tcp_validate_incoming() is the "len" being less than 20 (which it isn't) or the tcp checksum being wrong. The tcpdump showed the checksum as '0', but that could be due to tcp checksum offload. You could add some printks in there (After slow_path:) to report when tcp_checksum_complete_user() fails, particularly for th->rst packets. Or you could try turning off tcp checksum offloading with ethtool --offload rx off DEVICENAME (I think). It might help to see a tcpdump trace of the case where the "ssh" connection was broken successfully for comparison with the case where the nfs connection wasn't broken. Or it might not. NeilBrown
Attachment:
signature.asc
Description: PGP signature