Re: [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2023-09-18 at 11:17 +1000, NeilBrown wrote:
> On Mon, 18 Sep 2023, trondmy@xxxxxxxxxx wrote:
> > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > 
> > If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
> > NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
> > nfs4_schedule_state_manager(), and are responsible for handling the
> > recovery situation.
> > 
> > Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > ---
> >  fs/nfs/nfs4state.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index e079987af4a3..0bc160fbabec 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct
> > nfs_client *clp)
> >                 nfs4_end_drain_session(clp);
> >                 nfs4_clear_state_manager_bit(clp);
> >  
> > +               if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)
> > &&
> > +                   !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING,
> > +                                     &clp->cl_state)) {
> > +                       memflags = memalloc_nofs_save();
> > +                       continue;
> > +               }
> > +
> 
> I cannot see what race this closes.
> When we cleared MANAGER_RUNNING, the only possible consequence is
> that
> nfs4_wait_clnt_recover could have woken up.  This leads to
> nfs4_schedule_state_manager() being run, which sets RUN_MANAGER
> whether
> it was set or not.
> 
> I understand that there are problems with MANAGER_AVAILABLE which
> your
> next patch addresses, but I cannot see what this one actually fixes.
> Can you help me?
> 

If NFS4CLNT_RUN_MANAGER gets set while we're handling the reboot
recovery or network partition, then NFS4CLNT_MANAGER_RUNNING will be
set, so nfs4_schedule_state_manager() will just exit rather than start
a new thread. If we don't catch that situation before we start handling
the asynchronous delegation returns, then we can deadlock.

If, OTOH, nfs4_schedule_state_manager() runs after we've cleared
NFS4CLNT_MANAGER_RUNNING, then we should be OK (assuming both patches
are applied).

Cheers
 Trond

> Thanks,
> NeilBrown
> 
> 
> 
> >                 if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING,
> > &clp->cl_state)) {
> >                         if
> > (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) {
> >                                 nfs_client_return_marked_delegation
> > s(clp);
> > -- 
> > 2.41.0
> > 
> > 
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux