Linux NFSv4.1 client session seqid sometimes advances by 2

Rick Macklem <rmacklem@xxxxxxxxxxx> · Mon, 12 Apr 2021 20:57:15 +0000

Hi,

During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
server (4.1 mount), I have been simulating a network partitioning
for a few minutes (until the TCP connection goes to SYN_SENT on
the Linux client).

Sometimes, after the network partition heals, the FreeBSD server
replies NFS4ERR_SEQ_MISORDERED.
Looking at the packet trace, the seqid for the slot has advanced by
2 instead of 1. An RPC request for old-seqid + 1 never seems to get
sent.
--> Since sending an RPC with "seqid + 2" but never sending one
       that is "seqid + 1" for a slot seems harmless, I have added an optional
       hack (can be turned off), to allow this case instead of replying
       NFS4ERR_SEQ_MISORDERED for it. The code will still reply
       NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
       "old seqid + 1" in it.
       --> Yes, doing this hack is a violation of RFC5661, but I've
             done it anyhow.

If you are interested in a packet capture with this in it:
fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
- then look at packet #1945 and #2072
  --> You'll see that slot #1 seqid goes from 4 to 6. There is no
         slot#1 seqid 5 RPC sent on the wire.
         (This packet capture was taken on the Linux client using
          tcpdump.)
--> Btw, the "RST battle" you'll see in the above trace between
       #2005 and #2068 that goes on until the FreeBSD
       krpc/NFS times out the connection after 6min. seems to be a recent
       FreeBSD TCP bug.
       I have reproduced this seqid advances by 2 on an older system
       that does not "RST battle" and allows the reconnect right away,
       once the network partition is healed, so it does seem to be
       relevant to this bug.

Someday, I will get around to upgrading to a more recent Linux
kernel and will test to see if I can still reproduce this bug.
On 5.2.10, it is intermittent and does not occur every time I
do the network partitioning test.

Mostly just fyi, rick