This is a note to let you know that I've just added the patch titled NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() to the 6.6-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: nfsd-fix-start-of-nfs-reply-pointer-passed-to-nfsd_cache_update.patch and it can be found in the queue-6.6 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From stable+bounces-3080-greg=kroah.com@xxxxxxxxxxxxxxx Tue Nov 28 21:58:43 2023 From: Chuck Lever <cel@xxxxxxxxxx> Date: Tue, 28 Nov 2023 16:58:34 -0500 Subject: NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() To: stable@xxxxxxxxxxxxxxx Cc: linux-nfs@xxxxxxxxxxxxxxx Message-ID: <170120871426.1376.10151990384789497254.stgit@xxxxxxxxxxxxxxxxxxxxx> From: Chuck Lever <chuck.lever@xxxxxxxxxx> [ Upstream commit 1caf5f61dd8430ae5a0b4538afe4953ce7517cbb ] The "statp + 1" pointer that is passed to nfsd_cache_update() is supposed to point to the start of the egress NFS Reply header. In fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. But both krb5i and krb5p add fields between the RPC header's accept_stat field and the start of the NFS Reply header. In those cases, "statp + 1" points at the extra fields instead of the Reply. The result is that nfsd_cache_update() caches what looks to the client like garbage. A connection break can occur for a number of reasons, but the most common reason when using krb5i/p is a GSS sequence number window underrun. When an underrun is detected, the server is obliged to drop the RPC and the connection to force a retransmit with a fresh GSS sequence number. The client presents the same XID, it hits in the server's DRC, and the server returns the garbage cache entry. The "statp + 1" argument has been used since the oldest changeset in the kernel history repo, so it has been in nfsd_dispatch() literally since before history began. The problem arose only when the server-side GSS implementation was added twenty years ago. Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> Tested-by: Jeff Layton <jlayton@xxxxxxxxxx> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- fs/nfsd/nfssvc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -988,6 +988,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp const struct svc_procedure *proc = rqstp->rq_procinfo; __be32 *statp = rqstp->rq_accept_statp; struct nfsd_cacherep *rp; + __be32 *nfs_reply; /* * Give the xdr decoder a chance to change this if it wants @@ -1008,6 +1009,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp goto out_dropit; } + nfs_reply = xdr_inline_decode(&rqstp->rq_res_stream, 0); *statp = proc->pc_func(rqstp); if (test_bit(RQ_DROPME, &rqstp->rq_flags)) goto out_update_drop; @@ -1015,7 +1017,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) goto out_encode_err; - nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1); + nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, nfs_reply); out_cached_reply: return 1; Patches currently in stable-queue which might be from kroah.com@xxxxxxxxxxxxxxx are queue-6.6/nfsd-fix-checksum-mismatches-in-the-duplicate-reply-cache.patch queue-6.6/nfsd-fix-start-of-nfs-reply-pointer-passed-to-nfsd_cache_update.patch