Re: PROBLEM: NFS Client Ignores TCP Resets

NeilBrown <nfbrown@xxxxxxxxxx> · Sun, 03 Apr 2016 13:58:59 +1000

On Sun, Feb 14 2016, Richard Laager wrote:

> [1.] One line summary of the problem:
>
> NFS Client Ignores TCP Resets
>
> [2.] Full description of the problem/report:
>
> Steps to reproduce:
> 1) Mount NFS share from HA cluster with TCP.
> 2) Failover the HA cluster. (The NFS server's IP address moves from one
>     machine to the other.)
> 3) Access the mounted NFS share from the client (an `ls` is sufficient).
>
> Expected results:
> Accessing the NFS mount works fine immediately.
>
> Actual results:
> Accessing the NFS mount hangs for 5 minutes. Then the TCP connection 
> times out, a new connection is established, and it works fine again.
>
> After the IP moves, the new server responds to the client with TCP RST 
> packets, just as I would expect. I would expect the client to tear down 
> its TCP connection immediately and re-establish a new one. But it 
> doesn't. Am I confused, or is this a bug?
>
> For the duration of this test, all iptables firewalling was disabled on 
> the client machine.
>
> I have a packet capture of a minimized test (just a simple ls):
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1542826/+attachment/4571304/+files/dovecot-test.upstream-kernel.pcap

I notice that the server sends packets from a different MAC address to
the one it advertises in ARP replies (and the one the client sends to).
This is probably normal - maybe you have two interfaces bonded together?

Maybe it would help to be explicit about the network configuration
between client and server - are there switches?  soft or hard?

Where is tcpdump being run?  On the (virtual) client, or on the
(physical) host or elsewhere?

As you say, everything looks perfect until the server sends an RST and
the client appears to ignore it.  The from/to addresses are all
identical to those on the subsequent SYN/ACK which is not ignored so it
seems unlikely that the SYN/ACK would get through but not the RST.

This bug (it is definitely a bug somewhere) looks suspiciously similar
to the one fixed by
Commit: 7b514a886ba5 ("tcp: accept RST without ACK flag")
but that was fixed 3 years ago - a temporary bug in v3.8.  I cannot see
any evidence that it has crept back.

Can you create a TCP connection to some other port on the server
(telnet? ssh? http?) and see what happens to it on fail-over?
You would need some protocol that the server won't quickly close.
Maybe just "telnet SERVER 2049" and don't type anything until after the
failover.

If that closes quickly, then maybe it is an NFS bug.  If that persists
for a long timeout before closing, then it must be a network bug -
either in the network code or the network hardware.
In that case, netdev@xxxxxxxxxxxxxxx might be the best place to ask.

Looking at the debug logs, the most interesting (to me) part is

2016-03-11T03:27:24.897463-06:00 imap1 kernel:
  [  479.708050] RPC:       xs_error_report client ffff88003cfe4000, error=113...

error 113 is EHOSTUNREACH.  This strongly suggests that the network layer
didn't "see" the RST and has only broken the connection because it isn't
getting a reply from the server for it's GETATTR retransmissions.

If you were up to building your own kernel, I would suggest putting some
printks in tcp_validate_incoming() (in net/ipv4/tcp_input.c).

Print a message if th->rst is ever set, and another if the
tcp_sequence() test causes it to be discarded.  It shouldn't but
something seems to be discarding it somewhere...

NeilBrown
Attachment:
signature.asc

Description: PGP signature