On Mon, 2023-09-18 at 11:17 +1000, NeilBrown wrote: > On Mon, 18 Sep 2023, trondmy@xxxxxxxxxx wrote: > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared > > NFS4CLNT_MANAGER_RUNNING, then we might have won the race against > > nfs4_schedule_state_manager(), and are responsible for handling the > > recovery situation. > > > > Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > --- > > fs/nfs/nfs4state.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > index e079987af4a3..0bc160fbabec 100644 > > --- a/fs/nfs/nfs4state.c > > +++ b/fs/nfs/nfs4state.c > > @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct > > nfs_client *clp) > > nfs4_end_drain_session(clp); > > nfs4_clear_state_manager_bit(clp); > > > > + if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) > > && > > + !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, > > + &clp->cl_state)) { > > + memflags = memalloc_nofs_save(); > > + continue; > > + } > > + > > I cannot see what race this closes. > When we cleared MANAGER_RUNNING, the only possible consequence is > that > nfs4_wait_clnt_recover could have woken up. This leads to > nfs4_schedule_state_manager() being run, which sets RUN_MANAGER > whether > it was set or not. > > I understand that there are problems with MANAGER_AVAILABLE which > your > next patch addresses, but I cannot see what this one actually fixes. > Can you help me? > If NFS4CLNT_RUN_MANAGER gets set while we're handling the reboot recovery or network partition, then NFS4CLNT_MANAGER_RUNNING will be set, so nfs4_schedule_state_manager() will just exit rather than start a new thread. If we don't catch that situation before we start handling the asynchronous delegation returns, then we can deadlock. If, OTOH, nfs4_schedule_state_manager() runs after we've cleared NFS4CLNT_MANAGER_RUNNING, then we should be OK (assuming both patches are applied). Cheers Trond > Thanks, > NeilBrown > > > > > if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, > > &clp->cl_state)) { > > if > > (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) { > > nfs_client_return_marked_delegation > > s(clp); > > -- > > 2.41.0 > > > > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx