On Mon, Mar 04, 2024 at 06:11:24PM -0500, Jeff Layton wrote: > On Mon, 2024-03-04 at 17:54 -0500, Chuck Lever wrote: > > On Tue, Mar 05, 2024 at 09:36:29AM +1100, NeilBrown wrote: > > > On Tue, 05 Mar 2024, Chuck Lever wrote: > > > > On Tue, Mar 05, 2024 at 08:45:45AM +1100, NeilBrown wrote: > > > > > On Tue, 05 Mar 2024, Chuck Lever wrote: > > > > > > On Mon, Mar 04, 2024 at 03:40:21PM +1100, NeilBrown wrote: > > > > > > > move_to_close_lru() waits for sc_count to become zero while holding > > > > > > > rp_mutex. This can deadlock if another thread holds a reference and is > > > > > > > waiting for rp_mutex. > > > > > > > > > > > > > > By the time we get to move_to_close_lru() the openowner is unhashed and > > > > > > > cannot be found any more. So code waiting for the mutex can safely > > > > > > > retry the lookup if move_to_close_lru() has started. > > > > > > > > > > > > > > So change rp_mutex to an atomic_t with three states: > > > > > > > > > > > > > > RP_UNLOCK - state is still hashed, not locked for reply > > > > > > > RP_LOCKED - state is still hashed, is locked for reply > > > > > > > RP_UNHASHED - state is not hashed, no code can get a lock. > > > > > > > > > > > > > > Use wait_var_event() to wait for either a lock, or for the owner to be > > > > > > > unhashed. In the latter case, retry the lookup. > > > > > > > > > > > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > > > > > > --- > > > > > > > fs/nfsd/nfs4state.c | 38 +++++++++++++++++++++++++++++++------- > > > > > > > fs/nfsd/state.h | 2 +- > > > > > > > 2 files changed, 32 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > > > index 690d0e697320..47e879d5d68a 100644 > > > > > > > --- a/fs/nfsd/nfs4state.c > > > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > > > @@ -4430,21 +4430,32 @@ nfsd4_init_leases_net(struct nfsd_net *nn) > > > > > > > atomic_set(&nn->nfsd_courtesy_clients, 0); > > > > > > > } > > > > > > > > > > > > > > +enum rp_lock { > > > > > > > + RP_UNLOCKED, > > > > > > > + RP_LOCKED, > > > > > > > + RP_UNHASHED, > > > > > > > +}; > > > > > > > + > > > > > > > static void init_nfs4_replay(struct nfs4_replay *rp) > > > > > > > { > > > > > > > rp->rp_status = nfserr_serverfault; > > > > > > > rp->rp_buflen = 0; > > > > > > > rp->rp_buf = rp->rp_ibuf; > > > > > > > - mutex_init(&rp->rp_mutex); > > > > > > > + atomic_set(&rp->rp_locked, RP_UNLOCKED); > > > > > > > } > > > > > > > > > > > > > > -static void nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate, > > > > > > > - struct nfs4_stateowner *so) > > > > > > > +static int nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate, > > > > > > > + struct nfs4_stateowner *so) > > > > > > > { > > > > > > > if (!nfsd4_has_session(cstate)) { > > > > > > > - mutex_lock(&so->so_replay.rp_mutex); > > > > > > > + wait_var_event(&so->so_replay.rp_locked, > > > > > > > + atomic_cmpxchg(&so->so_replay.rp_locked, > > > > > > > + RP_UNLOCKED, RP_LOCKED) != RP_LOCKED); > > > > > > > > > > > > What reliably prevents this from being a "wait forever" ? > > > > > > > > > > That same thing that reliably prevented the original mutex_lock from > > > > > waiting forever. Note that this patch fixes a deadlock here. So clearly, there /were/ situations where "waiting forever" was possible with the mutex version of this code. > > > > > It waits until rp_locked is set to RP_UNLOCKED, which is precisely when > > > > > we previously called mutex_unlock. But it *also* aborts the wait if > > > > > rp_locked is set to RP_UNHASHED - so it is now more reliable. > > > > > > > > > > Does that answer the question? > > > > > > > > Hm. I guess then we are no worse off with wait_var_event(). > > > > > > > > I'm not as familiar with this logic as perhaps I should be. How long > > > > does it take for the wake-up to occur, typically? > > > > > > wait_var_event() is paired with wake_up_var(). > > > The wake up happens when wake_up_var() is called, which in this code is > > > always immediately after atomic_set() updates the variable. > > > > I'm trying to ascertain the actual wall-clock time that the nfsd thread > > is sleeping, at most. Is this going to be a possible DoS vector? Can > > it impact the ability for the server to shut down without hanging? > > Prior to this patch, there was a mutex in play here and we just released > it to wake up the waiters. This is more or less doing the same thing, it > just indicates the resulting state better. Well, it adds a third state so that a recovery action can be taken on wake-up in some cases. That avoids a deadlock, so this does count as a bug fix. > I doubt this will materially change how long the tasks are waiting. It might not be a longer wait, but it still seems difficult to prove that the wait_var_event() will /always/ be awoken somehow. Applying for now. -- Chuck Lever