On Wed, 2023-08-23 at 16:59 -0300, Pedro Principeza wrote: > [You don't often get email from pedro.principeza@xxxxxxxxxxxxx. Learn > why this is important at > https://aka.ms/LearnAboutSenderIdentification ;] > > Hi, there! > > We have some shares that use NFSv3 with TCP and Kerberos and have > been > hitting an intriguing issue with those. We have noticed that network > instabilities have been causing some 'Permission denied' errors on > files. > > The current scenario we have is based on NFSv3 over TCP clients, > configured with Kerberos (krb5p) authentication against a NetApp NFS > Server (ONTAP). This is happening regardless of the Kernel we use > (our main tests bear 4.15 and 5.15 generic Ubuntu Kernels - from > Bionic and Jammy), and we have not found any interesting commits in > either components upstream that would change the behaviour in hand. > > We tracked those issues down and found out that the 'Permission > denied' happens because our packets are failing the GSS checksum[1]. > We kept investigating and discovered, after some tcpdump, that this > happens because the client retransmits RPC packets, which increases > the GSS sequence number. Meanwhile, the response to the original > packet gets received, but the checksum fails because the client is > expecting a different GSS sequence number. > > This can be avoided with NFSv4 because the RPC client is created with > a "no retrans timeout" flag[2]. Such a flag is not set and is > impossible to set on NFSv3. We did some investigation and thought > that > setting this flag would fix our problems without the need to move to > NFSv4. > > Our question is: is there a reason this flag is not being set nor is > it possible to set it for NFSv3? Is there something on NFSv3 that > demands RPC retransmissions even with TCP? One "hint" we have come > across is that it is *explicitly mentioned* in NFSv4's RFC [3], and > there is nothing in NFSv3 at all - most likely due to the fact we're > dealing with a stateless protocol. > > Any comments would be greatly appreciated here! > > Thank you, > > [1] > https://github.com/torvalds/linux/blob/v5.15/net/sunrpc/auth_gss/gss_krb5_unseal.c#L194 > [2] https://github.com/torvalds/linux/blob/v5.15/fs/nfs/client.c#L521 > [3] https://datatracker.ietf.org/doc/html/rfc7530#section-3.1.1 NFSv3 servers are allowed to drop requests, and NFSv3 clients are expected to retransmit them when this happens. NFSv4 servers may not drop requests, and NFSv4 clients are expected never to retransmit (unless the connection breaks). For that reason we do set RPC_TASK_NO_RETRANS_TIMEOUT on NFSv4 and do not on NFSv3. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx