Re: [PATCH v2 1/3] NFSD: mark cl_cb_state as NFSD4_CB_DOWN if cl_cb_client is NULL

Chuck Lever <chuck.lever@xxxxxxxxxx> · Tue, 23 Apr 2024 14:08:07 -0400

On Tue, Apr 23, 2024 at 10:49:25AM -0700, Dai Ngo wrote:
> 
> On 4/23/24 6:41 AM, Chuck Lever wrote:
> > On Mon, Apr 22, 2024 at 08:12:31PM -0700, Dai Ngo wrote:
> > > In nfsd4_run_cb_work if the rpc_clnt for the back channel is no longer
> > > exists, the callback state in nfs4_client should be marked as NFSD4_CB_DOWN
> > > so the server can notify the client to establish a new back channel
> > > connection.
> > > 
> > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
> > > ---
> > >   fs/nfsd/nfs4callback.c | 9 +++++++--
> > >   1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index cf87ace7a1b0..f8bb5ff2e9ac 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1491,9 +1491,14 @@ nfsd4_run_cb_work(struct work_struct *work)
> > >   	clnt = clp->cl_cb_client;
> > >   	if (!clnt) {
> > > -		if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
> > > +		if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
> > >   			nfsd41_destroy_cb(cb);
> > > -		else {
> > > +			clear_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags);
> > > +
> > > +			/* let client knows BC is down when it reconnects */
> > > +			clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags);
> > > +			nfsd4_mark_cb_down(clp);
> > > +		} else {
> > >   			/*
> > >   			 * XXX: Ideally, we could wait for the client to
> > >   			 *	reconnect, but I haven't figured out how
> > NFSD4_CLIENT_CB_KILL is for when the lease is getting expunged. It's
> > not supposed to be used when only the transport is closed.
> 
> The reason NFSD4_CLIENT_CB_KILL needs to be set when the transport is
> closed is because of commit c1ccfcf1a9bf3.
> 
> When the transport is closed, nfsd4_conn_lost is called which then calls
> nfsd4_probe_callback to set NFSD4_CLIENT_CB_UPDATE and schedule cl_cb_null
> work to activate the callback worker (nfsd4_run_cb_work) to do the update.
> 
> Callback worker calls nfsd4_process_cb_update to do rpc_shutdown_client
> then clear cl_cb_client.
> 
> When nfsd4_process_cb_update returns to nfsd4_run_cb_work, if cl_cb_client
> is NULL and NFSD4_CLIENT_CB_KILL not set then it re-queues the callback,
> causing an infinite loop.

That's the way it is supposed to work today. The callback is
re-queued until the client reconnects, at which point the loop is
broken.

> > Thus, shouldn't you mark_cb_down in this arm, instead?
> 
> I'm not clear what you mean here, the callback worker calls
> nfsd4_mark_cb_down after destroying the callback.

No, I mean in the re-queue case.

> > Even so, isn't the
> > backchannel already marked down when we get here?
> 
> No, according to my testing. Without marking the back channel down the
> client does not re-establish the back channel when it reconnects.

I didn't expect that closing the transport on the server side would
need any changes in fs/nfsd/nfs4callback.c. Let me get the
backchannel retransmit behavior sorted first. I'm still working on
setting up a test rig here.

-- 
Chuck Lever