Question on RPC_TASK_NO_RETRANS_TIMEOUT / NFS_CS_NO_RETRANS_TIMEOUT for NFSv3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, there!

We have some shares that use NFSv3 with TCP and Kerberos and have been
hitting an intriguing issue with those. We have noticed that network
instabilities have been causing some 'Permission denied' errors on
files.

The current scenario we have is based on NFSv3 over TCP clients,
configured with Kerberos (krb5p) authentication against a NetApp NFS
Server (ONTAP).  This is happening regardless of the Kernel we use
(our main tests bear 4.15 and 5.15 generic Ubuntu Kernels - from
Bionic and Jammy), and we have not found any interesting commits in
either components upstream that would change the behaviour in hand.

We tracked those issues down and found out that the 'Permission
denied' happens because our packets are failing the GSS checksum[1].
We kept investigating and discovered, after some tcpdump, that this
happens because the client retransmits RPC packets, which increases
the GSS sequence number. Meanwhile, the response to the original
packet gets received, but the checksum fails because the client is
expecting a different GSS sequence number.

This can be avoided with NFSv4 because the RPC client is created with
a "no retrans timeout" flag[2]. Such a flag is not set and is
impossible to set on NFSv3. We did some investigation and thought that
setting this flag would fix our problems without the need to move to
NFSv4.

Our question is: is there a reason this flag is not being set nor is
it possible to set it for NFSv3? Is there something on NFSv3 that
demands RPC retransmissions even with TCP?  One "hint" we have come
across is that it is *explicitly mentioned* in NFSv4's RFC [3], and
there is nothing in NFSv3 at all - most likely due to the fact we're
dealing with a stateless protocol.

Any comments would be greatly appreciated here!

Thank you,

[1] https://github.com/torvalds/linux/blob/v5.15/net/sunrpc/auth_gss/gss_krb5_unseal.c#L194
[2] https://github.com/torvalds/linux/blob/v5.15/fs/nfs/client.c#L521
[3] https://datatracker.ietf.org/doc/html/rfc7530#section-3.1.1

--
Pedro Principeza
Senior Sustaining Operations Engineer



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux