> On Aug 19, 2021, at 9:01 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote: >> With NFSv4.1+ on RDMA, backchannel recovery appears not to work. >> >> xprt_setup_xxx_bc() is invoked by the client's first CREATE_SESSION >> operation, and it always marks the rpc_clnt's transport as >> connected. >> >> On a subsequent CREATE_SESSION, if rpc_create() is called and >> xpt_bc_xprt is populated, it might not be connected (for instance, >> if a backchannel fault has occurred). Ensure that code path returns >> a connected xprt also. >> >> Reported-by: Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx> >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >> --- >> net/sunrpc/clnt.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c >> index 8b4de70e8ead..570480a649a3 100644 >> --- a/net/sunrpc/clnt.c >> +++ b/net/sunrpc/clnt.c >> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct >> rpc_create_args *args) >> xprt = args->bc_xprt->xpt_bc_xprt; >> if (xprt) { >> xprt_get(xprt); >> + xprt_set_connected(xprt); >> return rpc_create_xprt(args, xprt); >> } >> } >> >> > > No. This is wrong. If the connection got disconnected, then the client > needs to reconnect and build a new connection altogether. We can't just > make pretend that the old connection still exists. The patch description is not clear: the client has not disconnected. The forward channel is functioning properly, and the server has set SEQ4_STATUS_BACKCHANNEL_FAULT. To recover, the client sends a DESTROY_SESSION / CREATE_SESSION pair on the existing connection. On the server, setup_callback_client() invokes rpc_create() again -- it's this step that is failing during the second CREATE_SESSION on a connection because the old xprt is returned but it's still marked disconnected. An alternative would be to ensure that setup_callback_client() always puts xpt_bc_xprt before it invokes rpc_create(). But it looked to me like rpc_create() already has a bunch of logic to deal with an existing xpt_bc_xprt. -- Chuck Lever