Kerberized NFS: EACCES errors due to GSS sequence number handling

Nikhil Jha <njha@xxxxxxxxxxxxxx> · Tue, 25 Feb 2025 15:50:36 -0500

Hi all,

I'm writing to report what appears to be a bug in the Linux kernel's
handling of RPCSEC_GSS sequence numbers during NFS request
retransmissions. We've observed this causing spurious EACCES errors in
our environment when using NFS with Kerberos authentication, even
using a hard mount. We've been able to reliably reproduce the issue.

When the client retransmits an operation (for example, because the
server is slow to respond), a new GSS sequence number is associated
with the XID. In the current kernel code the original sequence number
is discarded. Subsequently, if a response to the original request is
received there will be a GSS sequence number mismatch. A mismatch will
trigger another retransmit, possibly repeating the cycle, and after
some number of failed retries EACCES is returned.

Looking at RFC2203, section 5.3.3.1 suggests that the client “cache
the RPCSEC_GSS sequence number of each request it sends” and "compute
the checksum of each sequence number in the cache to try to match the
checksum in the reply's verifier." From a quick look this is what
FreeBSD’s implementation does (rpc_gss_validate in
sys/rpc/rpcsec_gss/rpcsec_gss.c).

Thoughts?

Thanks,
Nikhil