Re: [PATCH 1/9] nfsd: hold ->cl_lock for hash_delegation_locked()

"NeilBrown" <neilb@xxxxxxx> · Mon, 20 Nov 2023 09:07:41 +1100

On Sat, 18 Nov 2023, Chuck Lever wrote:
> On Fri, Nov 17, 2023 at 01:18:47PM +1100, NeilBrown wrote:
> > The protocol for creating a new state in nfsd is to allocated the state
> > leaving it largely uninitialised, add that state to the ->cl_stateids
> > idr so as to reserve a state id, then complete initialisation of the
> > state and only set ->sc_type to non-zero once the state is fully
> > initialised.
> > 
> > If a state is found in the idr with ->sc_type == 0, it is ignored.
> > The ->cl_lock list is used to avoid races - it is held while checking
> 
> s/->cl_lock list/->cl_lock lock
> 
> 
> > sc_type during lookup,
> 
> In particular, find_stateid_locked(), but yet, not in nfs4_find_file()
> 
> Can you help me understand why it's not needed in the latter case?

nfs4_find_file() is called from nfs4_check_file() which is called from
nfs4_preprocess_stateid_op(), which gets the nfs4_stid from
nfsd4_lookup_stateid().
That in turn gets the stateid from find_stateid_by_type() which gets it
from find_stateid_locked().

If find_stateid_locked() returns a stateid, then ->sc_type is not 0, and
it can never be set to zero (At least after subsequent patches land).

Or, more succinctly, nfs4_find_file() does not do lookup, so it doesn't
check sc_type against zero, so it doesn't need a lock.

> 
> 
> > and held when a non-zero value is stored in  ->sc_type.
> 
> I see a few additional spots where an sc_type value is set and
> cl_lock is not held:
> 
> init_open_stateid

 ->cl_lock is taken 9 lines before NFS4_OPEN_STID is assigned to
  >sc_type, and it is dropped 13 lines later.

> nfsd4_process_open2

This assignment does not change from zero to non-zero.  So it cannot
race with lookup, which tests for zero.
A later patch changes this assignment to be a change to the new
sc_status.

> 
> 
> > ... except... hash_delegation_locked() finalises the initialisation of a
> > delegation state, but does NOT hold ->cl_lock.
> > 
> > So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
> > and so ensures there are no races (which are extremely unlikely in any
> > case).
> 
> I would have expected that cl_lock should be taken first. Can the
> patch description provide some rationale for the lock ordering
> you chose?

I've added

As ->fi_lock is often taken when ->cl_lock is held, we need to take
->cl_lock first of those two.
Currently ->cl_lock and state_lock are never both taken at the same time.
We need both for this patch so an arbitrary choice is needed concerning
which to take first.  As state_lock is more global, it might be more
contended, so take it first.

I'm happy to choose a different ordering for ->cl_lock and state_lock if
you have a different justification - I accept that mine isn't
particularly strong.

> 
> Jeff asks in another email whether this fix should get copied to
> stable. Since the race is unlikely, I'm inclined to wait for an
> explicit problem report.

I agree.

Thanks,
NeilBrown

> 
> 
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> >  fs/nfsd/nfs4state.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 65fd5510323a..6368788a7d4e 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> >  
> >  	lockdep_assert_held(&state_lock);
> >  	lockdep_assert_held(&fp->fi_lock);
> > +	lockdep_assert_held(&clp->cl_lock);
> >  
> >  	if (nfs4_delegation_exists(clp, fp))
> >  		return -EAGAIN;
> > @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> >  		goto out_unlock;
> >  
> >  	spin_lock(&state_lock);
> > +	spin_lock(&clp->cl_lock);
> >  	spin_lock(&fp->fi_lock);
> >  	if (fp->fi_had_conflict)
> >  		status = -EAGAIN;
> >  	else
> >  		status = hash_delegation_locked(dp, fp);
> >  	spin_unlock(&fp->fi_lock);
> > +	spin_unlock(&clp->cl_lock);
> >  	spin_unlock(&state_lock);
> >  
> >  	if (status)
> > -- 
> > 2.42.0
> > 
> 
> -- 
> Chuck Lever
>