On Thu, 2024-01-25 at 11:28 -0500, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > As part of managing a client disconnect, NFSD closes down and > replaces the backchannel rpc_clnt. > > If a callback operation is pending when the backchannel rpc_clnt is > shut down, currently nfsd4_run_cb_work() just discards that > callback. But there are multiple cases to deal with here: > > o The client's lease is getting destroyed. Throw the CB away. > > o The client disconnected. It might be forcing a retransmit of > CB operations, or it could have disconnected for other reasons. > Reschedule the CB so it is retransmitted when the client > reconnects. > > Since callback operations can now be rescheduled, ensure that > cb_ops->prepare can be called only once by moving the > cb_ops->prepare paragraph down to just before the rpc_call_async() > call. > > Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()") > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > fs/nfsd/nfs4callback.c | 26 +++++++++++++++++--------- > 1 file changed, 17 insertions(+), 9 deletions(-) > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > index 43b0a34a5d5b..b2844abcb51f 100644 > --- a/fs/nfsd/nfs4callback.c > +++ b/fs/nfsd/nfs4callback.c > @@ -1375,20 +1375,22 @@ nfsd4_run_cb_work(struct work_struct *work) > struct rpc_clnt *clnt; > int flags; > > - if (cb->cb_need_restart) { > - cb->cb_need_restart = false; > - } else { > - if (cb->cb_ops && cb->cb_ops->prepare) > - cb->cb_ops->prepare(cb); > - } > - > if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK) > nfsd4_process_cb_update(cb); > > clnt = clp->cl_cb_client; > if (!clnt) { > - /* Callback channel broken, or client killed; give up: */ > - nfsd41_destroy_cb(cb); > + if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) > + nfsd41_destroy_cb(cb); > + else { > + /* > + * XXX: Ideally, we would wait for the client to > + * reconnect, but I haven't figured out how > + * to do that yet. > + */ > + msleep(30); > + nfsd4_queue_cb(cb); It would probably be better to just queue the cb as delayed_work here, so you don't squat on the workqueue thread. That'll mean changing cb_work to struct delayed_work, but that should be NBD. > + } > return; > } > > @@ -1401,6 +1403,12 @@ nfsd4_run_cb_work(struct work_struct *work) > return; > } > > + if (cb->cb_need_restart) { > + cb->cb_need_restart = false; > + } else { > + if (cb->cb_ops && cb->cb_ops->prepare) > + cb->cb_ops->prepare(cb); > + } > cb->cb_msg.rpc_cred = clp->cl_cb_cred; > flags = clp->cl_minorversion ? RPC_TASK_NOCONNECT : RPC_TASK_SOFTCONN; > rpc_call_async(clnt, &cb->cb_msg, RPC_TASK_SOFT | flags, > > > -- Jeff Layton <jlayton@xxxxxxxxxx>