Re: [PATCH 1/8] nfsd: don't restart v4.1+ callback when RPC_SIGNALLED is set

Jeff Layton <jlayton@xxxxxxxxxx> · Sun, 26 Jan 2025 06:18:21 -0500

On Sun, 2025-01-26 at 10:01 +1100, NeilBrown wrote:
> On Fri, 24 Jan 2025, Jeff Layton wrote:
> > This is problematic, since the RPC might have been entirely successful.
> > There is no point in restarting a v4.1+ callback just because
> > RPC_SIGNALLED is true. The v4.1+ error handling has other mechanisms for
> > detecting when it should retransmit the RPC.
> 
> But why might RPC_SIGNALLED() ever be true?
> The flag RPC_TASK_SIGNALLED is only ever set by rpc_signal_task() which
> is only called from rpc_killall_tasks() and __rpc_execute() for
> non-async tasks which doesn't apply to nfsd callbacks as they are
> started with rpc_call_async().
> 
> rpc_killall_tasks() is called by fs/nfs/ which isn't relevant for us,
> and from rpc_shutdown_client().  In those cases we certainly don't want
> the request to be retried, though the nfsd4_process_cb_update() case is
> a little interesting as we do want it to be retried, but in a different
> client.
>
> So the code you are removing is either dead code because something else
> will prevent the restart when a client is being shut down, or it is bad
> code because it would delay rpc_shutdown_client() while the request is
> retried. 
> 
> I haven't dug the extra step to figure out which, but either way I think
> the code should go.
> 
> 

Thanks. That was my analysis too.

rpc_shutdown_client() is called when we tear down and rebuild the
rpc_client. nfsd does this in setup_callback_client(), which gets
called from nfsd4_process_cb_update() (basically when we detect that
the backchannel is having problems).

There are really only two states: We either got a reply from the server
before the client went down, or we didn't. In the case where we got a
reply, there is no need to retry anything. In the case where we didn't,
the tk_status will be '1', so there is no need to check RPC_SIGNALLED()
at all here.

The existing code could lead to the call being retried when we had
already gotten a perfectly valid reply.

> > 
> > Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  fs/nfsd/nfs4callback.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 50e468bdb8d4838b5217346dcc2bd0fec1765c1a..e12205ef16ca932ffbcc86d67b0817aec2436c89 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1403,9 +1403,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> >  	}
> >  	trace_nfsd_cb_free_slot(task, cb);
> >  	nfsd41_cb_release_slot(cb);
> > -
> > -	if (RPC_SIGNALLED(task))
> > -		goto need_restart;
> >  out:
> >  	return ret;
> >  retry_nowait:
> > 
> > -- 
> > 2.48.1
> > 
> > 
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>