On Tue, Apr 23, 2024 at 10:49:25AM -0700, Dai Ngo wrote: > > On 4/23/24 6:41 AM, Chuck Lever wrote: > > On Mon, Apr 22, 2024 at 08:12:31PM -0700, Dai Ngo wrote: > > > In nfsd4_run_cb_work if the rpc_clnt for the back channel is no longer > > > exists, the callback state in nfs4_client should be marked as NFSD4_CB_DOWN > > > so the server can notify the client to establish a new back channel > > > connection. > > > > > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> > > > --- > > > fs/nfsd/nfs4callback.c | 9 +++++++-- > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > > > index cf87ace7a1b0..f8bb5ff2e9ac 100644 > > > --- a/fs/nfsd/nfs4callback.c > > > +++ b/fs/nfsd/nfs4callback.c > > > @@ -1491,9 +1491,14 @@ nfsd4_run_cb_work(struct work_struct *work) > > > clnt = clp->cl_cb_client; > > > if (!clnt) { > > > - if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) > > > + if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) { > > > nfsd41_destroy_cb(cb); > > > - else { > > > + clear_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags); > > > + > > > + /* let client knows BC is down when it reconnects */ > > > + clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags); > > > + nfsd4_mark_cb_down(clp); > > > + } else { > > > /* > > > * XXX: Ideally, we could wait for the client to > > > * reconnect, but I haven't figured out how > > NFSD4_CLIENT_CB_KILL is for when the lease is getting expunged. It's > > not supposed to be used when only the transport is closed. > > The reason NFSD4_CLIENT_CB_KILL needs to be set when the transport is > closed is because of commit c1ccfcf1a9bf3. > > When the transport is closed, nfsd4_conn_lost is called which then calls > nfsd4_probe_callback to set NFSD4_CLIENT_CB_UPDATE and schedule cl_cb_null > work to activate the callback worker (nfsd4_run_cb_work) to do the update. > > Callback worker calls nfsd4_process_cb_update to do rpc_shutdown_client > then clear cl_cb_client. > > When nfsd4_process_cb_update returns to nfsd4_run_cb_work, if cl_cb_client > is NULL and NFSD4_CLIENT_CB_KILL not set then it re-queues the callback, > causing an infinite loop. That's the way it is supposed to work today. The callback is re-queued until the client reconnects, at which point the loop is broken. > > Thus, shouldn't you mark_cb_down in this arm, instead? > > I'm not clear what you mean here, the callback worker calls > nfsd4_mark_cb_down after destroying the callback. No, I mean in the re-queue case. > > Even so, isn't the > > backchannel already marked down when we get here? > > No, according to my testing. Without marking the back channel down the > client does not re-establish the back channel when it reconnects. I didn't expect that closing the transport on the server side would need any changes in fs/nfsd/nfs4callback.c. Let me get the backchannel retransmit behavior sorted first. I'm still working on setting up a test rig here. -- Chuck Lever