On Thu, Jan 25, 2024 at 03:19:41PM -0500, Jeff Layton wrote: > On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote: > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > As part of managing a client disconnect, NFSD closes down and > > replaces the backchannel rpc_clnt. > > > > If a callback operation is pending when the backchannel rpc_clnt is > > shut down, currently nfsd4_run_cb_work() just discards that > > callback. But there are multiple cases to deal with here: > > > > o The client's lease is getting destroyed. Throw the CB away. > > > > o The client disconnected. It might be forcing a retransmit of > > CB operations, or it could have disconnected for other reasons. > > Reschedule the CB so it is retransmitted when the client > > reconnects. > > > > Since callback operations can now be rescheduled, ensure that > > cb_ops->prepare can be called only once by moving the > > cb_ops->prepare paragraph down to just before the rpc_call_async() > > call. > > > > Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()") > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > --- > > fs/nfsd/nfs4callback.c | 26 +++++++++++++++++--------- > > 1 file changed, 17 insertions(+), 9 deletions(-) > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > > index 43b0a34a5d5b..b2844abcb51f 100644 > > --- a/fs/nfsd/nfs4callback.c > > +++ b/fs/nfsd/nfs4callback.c > > @@ -1375,20 +1375,22 @@ nfsd4_run_cb_work(struct work_struct *work) > > struct rpc_clnt *clnt; > > int flags; > > > > - if (cb->cb_need_restart) { > > - cb->cb_need_restart = false; > > - } else { > > - if (cb->cb_ops && cb->cb_ops->prepare) > > - cb->cb_ops->prepare(cb); > > - } > > - > > if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK) > > nfsd4_process_cb_update(cb); > > > > clnt = clp->cl_cb_client; > > if (!clnt) { > > - /* Callback channel broken, or client killed; give up: */ > > - nfsd41_destroy_cb(cb); > > + if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) > > + nfsd41_destroy_cb(cb); > > + else { > > + /* > > + * XXX: Ideally, we would wait for the client to > > + * reconnect, but I haven't figured out how > > + * to do that yet. > > + */ > > + msleep(30); > > + nfsd4_queue_cb(cb); > > It would probably be better to just queue the cb as delayed_work here, > so you don't squat on the workqueue thread. You found my placeholder :-) > That'll mean changing > cb_work to struct delayed_work, but that should be NBD. I've looked at that. I wanted to be sure, before going that route, that there is no obvious way to implement a "wait for client to reconnect". msleep (or delayed_work) is basically a slow busy wait. > > + } > > return; > > } > > > > @@ -1401,6 +1403,12 @@ nfsd4_run_cb_work(struct work_struct *work) > > return; > > } > > > > + if (cb->cb_need_restart) { > > + cb->cb_need_restart = false; > > + } else { > > + if (cb->cb_ops && cb->cb_ops->prepare) > > + cb->cb_ops->prepare(cb); > > + } > > cb->cb_msg.rpc_cred = clp->cl_cb_cred; > > flags = clp->cl_minorversion ? RPC_TASK_NOCONNECT : RPC_TASK_SOFTCONN; > > rpc_call_async(clnt, &cb->cb_msg, RPC_TASK_SOFT | flags, > > > > > > > > -- > Jeff Layton <jlayton@xxxxxxxxxx> -- Chuck Lever