Re: [PATCH 1/1] NFSD: fix hang in nfsd4_shutdown_callback

Jeff Layton <jlayton@xxxxxxxxxx> · Wed, 29 Jan 2025 17:38:21 -0500

On Wed, 2025-01-29 at 16:39 -0500, Chuck Lever wrote:
> On 1/29/25 4:17 PM, Dai Ngo wrote:
> > If nfs4_client is in COURTESY state then there is no point to retry
> > the callback. This causes nfsd4_shutdown_callback to hang since
> > cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
> > notifies NFSD that the connection was closed.
> > 
> > This patch modifies nfsd4_cb_sequence_done to skip the restart the
> > RPC if nfs4_client is in  COURTESY state.
> > 
> > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
> > ---
> >  fs/nfsd/nfs4callback.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 50e468bdb8d4..c90f94898cc5 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1372,6 +1372,11 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >  		ret = false;
> >  		break;
> >  	case 1:
> > +		if (clp->cl_state == NFSD4_COURTESY) {
> > +			nfsd4_mark_cb_fault(cb->cb_clp);
> > +			ret = false;
> > +			break;
> > +		}
> 
> We could do toss this operation here or at the top of
> nfsd4_run_cb_work().
> 

Like Olga, I'm wondering if this is the culprit on the recently
reported rpc_shutdown_client hangs.

I'm assuming that with a courtesy client that we don't want to do any
callbacks? If that's the case, then I think doing it early in
nfsd4_run_cb_work() would be better. That would prevent new cb's from
being queued until the cl_state changes too.

> 
> >  		/*
> >  		 * cb_seq_status remains 1 if an RPC Reply was never
> >  		 * received. NFSD can't know if the client processed
> 
> 

Nice catch though!
-- 
Jeff Layton <jlayton@xxxxxxxxxx>