Re: Linux NFSv4.1 client session seqid sometimes advances by 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem <rmacklem@xxxxxxxxxxx> wrote:
>
> Hi,
>
> During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD
> server (4.1 mount), I have been simulating a network partitioning
> for a few minutes (until the TCP connection goes to SYN_SENT on
> the Linux client).
>
> Sometimes, after the network partition heals, the FreeBSD server
> replies NFS4ERR_SEQ_MISORDERED.
> Looking at the packet trace, the seqid for the slot has advanced by
> 2 instead of 1. An RPC request for old-seqid + 1 never seems to get
> sent.
> --> Since sending an RPC with "seqid + 2" but never sending one
>        that is "seqid + 1" for a slot seems harmless, I have added an optional
>        hack (can be turned off), to allow this case instead of replying
>        NFS4ERR_SEQ_MISORDERED for it. The code will still reply
>        NFS4ERR_SEQ_MISORDERED if an RPC for the slot with
>        "old seqid + 1" in it.
>        --> Yes, doing this hack is a violation of RFC5661, but I've
>              done it anyhow.
>
> If you are interested in a packet capture with this in it:
> fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap
> - then look at packet #1945 and #2072
>   --> You'll see that slot #1 seqid goes from 4 to 6. There is no
>          slot#1 seqid 5 RPC sent on the wire.
>          (This packet capture was taken on the Linux client using
>           tcpdump.)
> --> Btw, the "RST battle" you'll see in the above trace between
>        #2005 and #2068 that goes on until the FreeBSD
>        krpc/NFS times out the connection after 6min. seems to be a recent
>        FreeBSD TCP bug.
>        I have reproduced this seqid advances by 2 on an older system
>        that does not "RST battle" and allows the reconnect right away,
>        once the network partition is healed, so it does seem to be
>        relevant to this bug.
>
> Someday, I will get around to upgrading to a more recent Linux
> kernel and will test to see if I can still reproduce this bug.
> On 5.2.10, it is intermittent and does not occur every time I
> do the network partitioning test.
>
> Mostly just fyi, rick

Hi Rick,

I think this is happening because slotid=1 had something queued up
using seqid=5 and that was interrupted because the connection was
RSTed. For the interrupted slot, the client would send solo SEQUENCE
with +1 seqid.



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux