On Fri, 2023-08-25 at 11:12 -0500, Russell Cattelan wrote: > [You don't often get email from cattelan@xxxxxxxxxxx. Learn why this > is important at https://aka.ms/LearnAboutSenderIdentification ;] > > > > Hi, there! > > > > > > We have some shares that use NFSv3 with TCP and Kerberos and have > > > been > > > hitting an intriguing issue with those. We have noticed that > > > network > > > instabilities have been causing some 'Permission denied' errors > > > on > > > files. > > > > > > The current scenario we have is based on NFSv3 over TCP clients, > > > configured with Kerberos (krb5p) authentication against a NetApp > > > NFS > > > Server (ONTAP). This is happening regardless of the Kernel we > > > use > > > (our main tests bear 4.15 and 5.15 generic Ubuntu Kernels - from > > > Bionic and Jammy), and we have not found any interesting commits > > > in > > > either components upstream that would change the behaviour in > > > hand. > > > > > > We tracked those issues down and found out that the 'Permission > > > denied' happens because our packets are failing the GSS checksum. > > > We kept investigating and discovered, after some tcpdump, that > > > this > > > happens because the client retransmits RPC packets, which > > > increases > > > the GSS sequence number. Meanwhile, the response to the original > > > packet gets received, but the checksum fails because the client > > > is > > > expecting a different GSS sequence number. > > > > > > This can be avoided with NFSv4 because the RPC client is created > > > with > > > a "no retrans timeout" flag. Such a flag is not set and is > > > impossible to set on NFSv3. We did some investigation and thought > > > that > > > setting this flag would fix our problems without the need to move > > > to > > > NFSv4. > > > > > > Our question is: is there a reason this flag is not being set nor > > > is > > > it possible to set it for NFSv3? Is there something on NFSv3 that > > > demands RPC retransmissions even with TCP? One "hint" we have > > > come > > > across is that it is *explicitly mentioned* in NFSv4's RFC, and > > > there is nothing in NFSv3 at all - most likely due to the fact > > > we're > > > dealing with a stateless protocol. > > > > > > Any comments would be greatly appreciated here! > > > > > > Thank you, > > > > > > [1] > > > https://github.com/torvalds/linux/blob/v5.15/net/sunrpc/auth_gss/gss_krb5_unseal.c#L194 > > > [2] > > > https://github.com/torvalds/linux/blob/v5.15/fs/nfs/client.c#L521 > > > [3] https://datatracker.ietf.org/doc/html/rfc7530#section-3.1.1 > > > > NFSv3 servers are allowed to drop requests, and NFSv3 clients are > > expected to retransmit them when this happens. NFSv4 servers may > > not > > drop requests, and NFSv4 clients are expected never to retransmit > > (unless the connection breaks). For that reason we do set > > RPC_TASK_NO_RETRANS_TIMEOUT on NFSv4 and do not on NFSv3. > > > We have been doing a bunch of debugging on this issue and the key > point / problem we are > running into is that because this is a kerberos enabled mount when > the client does a > re-transmit it ends up generating a new MIC header / checksum since > the krb5 context > sequence number has moved on. > > If that retrans happens before the original response is received then > the mic verification > fails since the client is now expecting a response to the second > packet and not the first. > mic header verification fails which then results in an EACCES error > which ends up as an IO > error at the application. > > What we have found that is it easy to repro in our environment adding > an iptables > rule to drop responses from the nfs server for 55-63 seconds. > Less than 55 sec and the retrans does not happen things recover > More than 63 sec and the rpc code goes down the reconnect path before > doing the retrans and > things recover. > > It seems like kerberos enabled mounts should be using > RPC_TASK_NO_RETRANS_TIMEOUT since doing > a retrans changes the GSS checksum from the original checksum. > No, that is not an option. NFSv3 servers are allowed to drop any incoming RPC request without needing a reason, so turning on RPC_TASK_NO_RETRANS_TIMEOUT would just lead to client hangs. The right thing to do is to just fix up rpc_decode_header() to retry instead of firing off an error in this case. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx