Re: [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()

"NeilBrown" <neilb@xxxxxxx> · Tue, 05 Mar 2024 10:04:21 +1100

On Tue, 05 Mar 2024, Chuck Lever wrote:
> On Tue, Mar 05, 2024 at 09:36:29AM +1100, NeilBrown wrote:
> > On Tue, 05 Mar 2024, Chuck Lever wrote:
> > > On Tue, Mar 05, 2024 at 08:45:45AM +1100, NeilBrown wrote:
> > > > On Tue, 05 Mar 2024, Chuck Lever wrote:
> > > > > On Mon, Mar 04, 2024 at 03:40:21PM +1100, NeilBrown wrote:
> > > > > > move_to_close_lru() waits for sc_count to become zero while holding
> > > > > > rp_mutex.  This can deadlock if another thread holds a reference and is
> > > > > > waiting for rp_mutex.
> > > > > > 
> > > > > > By the time we get to move_to_close_lru() the openowner is unhashed and
> > > > > > cannot be found any more.  So code waiting for the mutex can safely
> > > > > > retry the lookup if move_to_close_lru() has started.
> > > > > > 
> > > > > > So change rp_mutex to an atomic_t with three states:
> > > > > > 
> > > > > >  RP_UNLOCK   - state is still hashed, not locked for reply
> > > > > >  RP_LOCKED   - state is still hashed, is locked for reply
> > > > > >  RP_UNHASHED - state is not hashed, no code can get a lock.
> > > > > > 
> > > > > > Use wait_var_event() to wait for either a lock, or for the owner to be
> > > > > > unhashed.  In the latter case, retry the lookup.
> > > > > > 
> > > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > > > > > ---
> > > > > >  fs/nfsd/nfs4state.c | 38 +++++++++++++++++++++++++++++++-------
> > > > > >  fs/nfsd/state.h     |  2 +-
> > > > > >  2 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > index 690d0e697320..47e879d5d68a 100644
> > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > > @@ -4430,21 +4430,32 @@ nfsd4_init_leases_net(struct nfsd_net *nn)
> > > > > >  	atomic_set(&nn->nfsd_courtesy_clients, 0);
> > > > > >  }
> > > > > >  
> > > > > > +enum rp_lock {
> > > > > > +	RP_UNLOCKED,
> > > > > > +	RP_LOCKED,
> > > > > > +	RP_UNHASHED,
> > > > > > +};
> > > > > > +
> > > > > >  static void init_nfs4_replay(struct nfs4_replay *rp)
> > > > > >  {
> > > > > >  	rp->rp_status = nfserr_serverfault;
> > > > > >  	rp->rp_buflen = 0;
> > > > > >  	rp->rp_buf = rp->rp_ibuf;
> > > > > > -	mutex_init(&rp->rp_mutex);
> > > > > > +	atomic_set(&rp->rp_locked, RP_UNLOCKED);
> > > > > >  }
> > > > > >  
> > > > > > -static void nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
> > > > > > -		struct nfs4_stateowner *so)
> > > > > > +static int nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
> > > > > > +				      struct nfs4_stateowner *so)
> > > > > >  {
> > > > > >  	if (!nfsd4_has_session(cstate)) {
> > > > > > -		mutex_lock(&so->so_replay.rp_mutex);
> > > > > > +		wait_var_event(&so->so_replay.rp_locked,
> > > > > > +			       atomic_cmpxchg(&so->so_replay.rp_locked,
> > > > > > +					      RP_UNLOCKED, RP_LOCKED) != RP_LOCKED);
> > > > > 
> > > > > What reliably prevents this from being a "wait forever" ?
> > > > 
> > > > That same thing that reliably prevented the original mutex_lock from
> > > > waiting forever.
> > > > It waits until rp_locked is set to RP_UNLOCKED, which is precisely when
> > > > we previously called mutex_unlock.  But it *also* aborts the wait if
> > > > rp_locked is set to RP_UNHASHED - so it is now more reliable.
> > > > 
> > > > Does that answer the question?
> > > 
> > > Hm. I guess then we are no worse off with wait_var_event().
> > > 
> > > I'm not as familiar with this logic as perhaps I should be. How long
> > > does it take for the wake-up to occur, typically?
> > 
> > wait_var_event() is paired with wake_up_var().
> > The wake up happens when wake_up_var() is called, which in this code is
> > always immediately after atomic_set() updates the variable.
> 
> I'm trying to ascertain the actual wall-clock time that the nfsd thread
> is sleeping, at most. Is this going to be a possible DoS vector? Can
> it impact the ability for the server to shut down without hanging?

Certainly the wall-clock sleep time is no more than it was before this
patch.
At most it is the time it takes for some other request running in some
other nfsd thread to complete.

Well.... technically if there were a large number of concurrent requests
from a client that all required claiming this lock, one of the threads
might be blocked while all the others threads get a turn.  There is no
fairness guarantee with this style of waiting.  So one thread might be
blocked indefinitely while other threads take turns taking the lock.
It's not really a DoS vector as the client is only really denying
service to itself by keeping the server excessively busy.  Other
clients will still get a reasonable turn.

NeilBrown

> 
> 
> -- 
> Chuck Lever
>