On Tue, Dec 17, 2024 at 9:58 PM NeilBrown <neilb@xxxxxxx> wrote: > > > Hi, > I've been pondering the messages > > receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt XXXXXXX xid XXXXXX I wonder why are you seeing this? There was this commit 05a4b58301c3 "SUNRPC: remove printk when back channel request not found" from a year ago.... > that turn up occasionally. Google reports a variety of hits and I've > seen them in a logs from a customer though I don't think they were > directly related to the customer's problem. > > These messages suggest a callback reply from the client which the server > was not expecting. I think the most likely cause that the server called > rpc_shutdown_client(clp->cl_cb_client); > while there were outstanding callbacks. > This causes rpc_killall_tasks() to be called so that the tasks stop > waiting for a reply and are discarded. > > The rpc_shutdown_client() call can come from nfsd4_process_cb_update() > which gets runs whenever nfsd4_probe_callback() is called. This happens > in quite a few places including when a new connection is bound to a > session. > > So if a new connection is bound, the current callback channel is aborted > even though it is working perfectly well. That is particularly > problematic as callback request are not currently retransmitted. > > So I'm wondering if nfsd4_process_cb_update() should only shutdown the > current cb client if there is evidence that it isn't work. > > I'm not certain how best to do that. One option might be to do a search > similar to that in __nfsd4_find_backchannel() and see if the current > session and xprt are still valid. There might be a better way. > > Thoughts? > > Thanks, > NeilBrown >