> On Aug 19, 2021, at 8:14 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2021-08-19 at 14:34 +0000, Chuck Lever III wrote: >> >> >>> On Aug 19, 2021, at 10:14 AM, Trond Myklebust >>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>> >>> On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote: >>>> >>>> >>>>> On Aug 19, 2021, at 9:01 AM, Trond Myklebust >>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>> On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote: >>>>>> With NFSv4.1+ on RDMA, backchannel recovery appears not to >>>>>> work. >>>>>> >>>>>> xprt_setup_xxx_bc() is invoked by the client's first >>>>>> CREATE_SESSION >>>>>> operation, and it always marks the rpc_clnt's transport as >>>>>> connected. >>>>>> >>>>>> On a subsequent CREATE_SESSION, if rpc_create() is called and >>>>>> xpt_bc_xprt is populated, it might not be connected (for >>>>>> instance, >>>>>> if a backchannel fault has occurred). Ensure that code path >>>>>> returns >>>>>> a connected xprt also. >>>>>> >>>>>> Reported-by: Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx> >>>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>> --- >>>>>> net/sunrpc/clnt.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c >>>>>> index 8b4de70e8ead..570480a649a3 100644 >>>>>> --- a/net/sunrpc/clnt.c >>>>>> +++ b/net/sunrpc/clnt.c >>>>>> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct >>>>>> rpc_create_args *args) >>>>>> xprt = args->bc_xprt->xpt_bc_xprt; >>>>>> if (xprt) { >>>>>> xprt_get(xprt); >>>>>> + xprt_set_connected(xprt); >>>>>> return rpc_create_xprt(args, xprt); >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>> >>>>> No. This is wrong. If the connection got disconnected, then the >>>>> client >>>>> needs to reconnect and build a new connection altogether. We >>>>> can't >>>>> just >>>>> make pretend that the old connection still exists. >>>> >>>> The patch description is not clear: the client has not >>>> disconnected. >>>> The forward channel is functioning properly, and the server has >>>> set >>>> SEQ4_STATUS_BACKCHANNEL_FAULT. >>>> >>>> To recover, the client sends a DESTROY_SESSION / CREATE_SESSION >>>> pair >>>> on the existing connection. On the server, >>>> setup_callback_client() >>>> invokes rpc_create() again -- it's this step that is failing >>>> during >>>> the second CREATE_SESSION on a connection because the old xprt >>>> is returned but it's still marked disconnected. >>> >>> How is that happening? >> >> The RPC client's autodisconnect is firing. This was fixed in >> 6820bf77864d >> ("svcrdma: disable timeouts on rdma backchannel"). >> >> However we're still seeing cases where backchannel recovery is not >> working. See: >> >> https://lore.kernel.org/linux-nfs/c3e31e57-15fa-8f70-bbb8-af5452c1f5fc@xxxxxxxxxxxxxxxx/T/#t >> >> As an experiment, we've reverted 6820bf77864d to try to provoke the >> bug. >> >> >>> As far as I can tell, the only thing that can >>> cause the backchannel to be marked as closed is a call to >>> svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops- >>>> close(xprt->xpt_bc_xprt). >>> >>> The server shouldn't be reusing anything from that svc_xprt after >>> that. >> >>>> An alternative would be to ensure that setup_callback_client() >>>> always puts xpt_bc_xprt before it invokes rpc_create(). But it >>>> looked to me like rpc_create() already has a bunch of logic to >>>> deal with an existing xpt_bc_xprt. >>> >>> The connection status is managed by the transport layer, not the >>> RPC >>> client layer. >> >> If it's a bug to mark a backchannel transport disconnected, then >> asserts should be added to capture the problem. > > It isn't a bug to mark the transport as disconnected. It is a bug to > mark it as disconnected and then to continue using it as if it were > connected. Are you suggesting the server's backchannel logic should look for -107 status after posting an RPC, and then recover somehow? Perhaps it could put xpt_bc_xprt then set it to NULL, and then invoke setup_callback_client() again ? >> What is the purpose of this code in rpc_create() ? >> >> 533 if (args->bc_xprt) { >> 534 WARN_ON_ONCE(!(args->protocol & >> XPRT_TRANSPORT_BC)); >> 535 xprt = args->bc_xprt->xpt_bc_xprt; >> 536 if (xprt) { >> 537 xprt_get(xprt); >> 538 return rpc_create_xprt(args, xprt); >> 539 } >> 540 } >> > > Are you asking me or Bruce? Fair enough, adding Bruce. > I'm not much of a fan of this code snippet either, because it doesn't > really appear to have much to do with the normal function of > rpc_create(). > However as I understand it, the purpose is to create an instance of > struct rpc_clnt when presented with an existing instance of svc_xprt. > It does not currently modify that svc_xprt instance in any way that > breaks layering. All it does is take a reference to the xpt_bc_xprt > field. I see that the initial addition of this logic was done by d531c008d7d9 ("NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp"). It was later moved into rpc_create() by d50039ea5ee6 ("nfsd4/rpc: move backchannel create logic into rpc code"). Probably the essential purpose of this code is explained by a comment that was deleted by d531c008d7d9: - if (args->bc_xprt->xpt_bc_xprt) { - /* - * This server connection already has a backchannel - * transport; we can't create a new one, as we wouldn't - * be able to match replies based on xid any more. So, - * reuse the already-existing one: - */ - return args->bc_xprt->xpt_bc_xprt; - } Is this still a sensible expectation? -- Chuck Lever