> On Jan 16, 2024, at 8:52 AM, Connor Smith <connor.smith@xxxxxxxxxxxxxxxxxx> wrote: > > Hi, > > I noticed what I think might be a discrepancy in the way Linux nfs and nfsd manage CREATE_SESSION reply cache sequence IDs. > > In fs/nfs/nfs4proc.c (_nfs4_proc_create_session), the seqid looks to be incremented so long as the status is not one of NFS4ERR_STALE_CLIENTID, NFS4ERR_DELAY, or an RPC timeout. This is mostly due to: > > commit b519d408ea32040b1c7e10b155a3ee9a36660947 > Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > Date: Sun Sep 11 14:50:01 2016 -0400 > > NFSv4.1: Fix the CREATE_SESSION slot number accounting > > Ensure that we conform to the algorithm described in RFC5661, section > 18.36.4 for when to bump the sequence id. In essence we do it for all > cases except when the RPC call timed out, or in case of the server returning > NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID. > > (Note that the comment /* Increment the clientid slot sequence id */ not being moved in the above commit has, I think, left it in the wrong place.) > > Meanwhile, in fs/nfsd/nfs4state.c (nfsd4_create_session), the seqid looks to be incremented only when the session has been successfully created and the status will be NFS4_OK. This is mostly due to: > > commit 86c3e16cc7aace4d1143952813b6cc2a80c51295 > Author: J. Bruce Fields <bfields@xxxxxxxxxx> > Date: Sat Oct 2 17:04:00 2010 -0400 > > nfsd4: confirm only on succesful create_session > > Following rfc 5661, section 18.36.4: "If the session is not successfully > created, then no changes are made to any client records on the server." > We shouldn't be confirming or incrementing the sequence id in this case. > > My reading of RFC8881 18.36.4 suggests that the former is correct - after the sequence ID has been processed in phase 2... > >> [...] the CREATE_SESSION processing goes to the next phase. A subsequent new CREATE_SESSION call over the same client ID MUST use a csa_sequenceid that is one greater than the sequence ID in the slot. > > Client ID confirmation is in phase 3, for example, and returns NFS4ERR_CLID_INUSE. Since this is after phase 2, the sequence ID should be incremented on that error code. My understanding is that a "client record" refers specifically to the 5-tuple described in 18.35.4, which does not include the CREATE_SESSION reply cache, so although I think the latter commit was right to move the confirmation of the client record, I'm not sure about the incrementing of the sequence ID. > > Apologies if I'm mistaken about something here, but I was confused by what looked to me like a difference in 'slot number accounting'. Hello Connor - On first blush, your interpretation of S18.36.4 looks correct to me. We need to study this further and also look into how pynfs treats CREATE_SESSION retransmits. Please file a bug report here: https://bugzilla.kernel.org/ under File Systems / NFS -- Chuck Lever