On Sat, 18 Nov 2023, Chuck Lever wrote: > On Fri, Nov 17, 2023 at 01:18:47PM +1100, NeilBrown wrote: > > The protocol for creating a new state in nfsd is to allocated the state > > leaving it largely uninitialised, add that state to the ->cl_stateids > > idr so as to reserve a state id, then complete initialisation of the > > state and only set ->sc_type to non-zero once the state is fully > > initialised. > > > > If a state is found in the idr with ->sc_type == 0, it is ignored. > > The ->cl_lock list is used to avoid races - it is held while checking > > s/->cl_lock list/->cl_lock lock > > > > sc_type during lookup, > > In particular, find_stateid_locked(), but yet, not in nfs4_find_file() > > Can you help me understand why it's not needed in the latter case? nfs4_find_file() is called from nfs4_check_file() which is called from nfs4_preprocess_stateid_op(), which gets the nfs4_stid from nfsd4_lookup_stateid(). That in turn gets the stateid from find_stateid_by_type() which gets it from find_stateid_locked(). If find_stateid_locked() returns a stateid, then ->sc_type is not 0, and it can never be set to zero (At least after subsequent patches land). Or, more succinctly, nfs4_find_file() does not do lookup, so it doesn't check sc_type against zero, so it doesn't need a lock. > > > > and held when a non-zero value is stored in ->sc_type. > > I see a few additional spots where an sc_type value is set and > cl_lock is not held: > > init_open_stateid ->cl_lock is taken 9 lines before NFS4_OPEN_STID is assigned to >sc_type, and it is dropped 13 lines later. > nfsd4_process_open2 This assignment does not change from zero to non-zero. So it cannot race with lookup, which tests for zero. A later patch changes this assignment to be a change to the new sc_status. > > > > ... except... hash_delegation_locked() finalises the initialisation of a > > delegation state, but does NOT hold ->cl_lock. > > > > So this patch takes ->cl_lock at the appropriate time w.r.t other locks, > > and so ensures there are no races (which are extremely unlikely in any > > case). > > I would have expected that cl_lock should be taken first. Can the > patch description provide some rationale for the lock ordering > you chose? I've added As ->fi_lock is often taken when ->cl_lock is held, we need to take ->cl_lock first of those two. Currently ->cl_lock and state_lock are never both taken at the same time. We need both for this patch so an arbitrary choice is needed concerning which to take first. As state_lock is more global, it might be more contended, so take it first. I'm happy to choose a different ordering for ->cl_lock and state_lock if you have a different justification - I accept that mine isn't particularly strong. > > Jeff asks in another email whether this fix should get copied to > stable. Since the race is unlikely, I'm inclined to wait for an > explicit problem report. I agree. Thanks, NeilBrown > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > fs/nfsd/nfs4state.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index 65fd5510323a..6368788a7d4e 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp) > > > > lockdep_assert_held(&state_lock); > > lockdep_assert_held(&fp->fi_lock); > > + lockdep_assert_held(&clp->cl_lock); > > > > if (nfs4_delegation_exists(clp, fp)) > > return -EAGAIN; > > @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, > > goto out_unlock; > > > > spin_lock(&state_lock); > > + spin_lock(&clp->cl_lock); > > spin_lock(&fp->fi_lock); > > if (fp->fi_had_conflict) > > status = -EAGAIN; > > else > > status = hash_delegation_locked(dp, fp); > > spin_unlock(&fp->fi_lock); > > + spin_unlock(&clp->cl_lock); > > spin_unlock(&state_lock); > > > > if (status) > > -- > > 2.42.0 > > > > -- > Chuck Lever >