On Fri, 2021-08-20 at 13:58 +0000, Chuck Lever III wrote: > > > > On Aug 19, 2021, at 8:14 PM, Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Thu, 2021-08-19 at 14:34 +0000, Chuck Lever III wrote: > > > > > > > > > > On Aug 19, 2021, at 10:14 AM, Trond Myklebust > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote: > > > > > > > > > > > > > > > > On Aug 19, 2021, at 9:01 AM, Trond Myklebust > > > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote: > > > > > > > With NFSv4.1+ on RDMA, backchannel recovery appears not to > > > > > > > work. > > > > > > > > > > > > > > xprt_setup_xxx_bc() is invoked by the client's first > > > > > > > CREATE_SESSION > > > > > > > operation, and it always marks the rpc_clnt's transport as > > > > > > > connected. > > > > > > > > > > > > > > On a subsequent CREATE_SESSION, if rpc_create() is called > > > > > > > and > > > > > > > xpt_bc_xprt is populated, it might not be connected (for > > > > > > > instance, > > > > > > > if a backchannel fault has occurred). Ensure that code path > > > > > > > returns > > > > > > > a connected xprt also. > > > > > > > > > > > > > > Reported-by: Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx> > > > > > > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > > > > --- > > > > > > > net/sunrpc/clnt.c | 1 + > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > > > > > > index 8b4de70e8ead..570480a649a3 100644 > > > > > > > --- a/net/sunrpc/clnt.c > > > > > > > +++ b/net/sunrpc/clnt.c > > > > > > > @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct > > > > > > > rpc_create_args *args) > > > > > > > xprt = args->bc_xprt->xpt_bc_xprt; > > > > > > > if (xprt) { > > > > > > > xprt_get(xprt); > > > > > > > + xprt_set_connected(xprt); > > > > > > > return rpc_create_xprt(args, xprt); > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > No. This is wrong. If the connection got disconnected, then > > > > > > the > > > > > > client > > > > > > needs to reconnect and build a new connection altogether. We > > > > > > can't > > > > > > just > > > > > > make pretend that the old connection still exists. > > > > > > > > > > The patch description is not clear: the client has not > > > > > disconnected. > > > > > The forward channel is functioning properly, and the server has > > > > > set > > > > > SEQ4_STATUS_BACKCHANNEL_FAULT. > > > > > > > > > > To recover, the client sends a DESTROY_SESSION / CREATE_SESSION > > > > > pair > > > > > on the existing connection. On the server, > > > > > setup_callback_client() > > > > > invokes rpc_create() again -- it's this step that is failing > > > > > during > > > > > the second CREATE_SESSION on a connection because the old xprt > > > > > is returned but it's still marked disconnected. > > > > > > > > How is that happening? > > > > > > The RPC client's autodisconnect is firing. This was fixed in > > > 6820bf77864d > > > ("svcrdma: disable timeouts on rdma backchannel"). > > > > > > However we're still seeing cases where backchannel recovery is not > > > working. See: > > > > > > https://lore.kernel.org/linux-nfs/c3e31e57-15fa-8f70-bbb8-af5452c1f5fc@xxxxxxxxxxxxxxxx/T/#t > > > > > > As an experiment, we've reverted 6820bf77864d to try to provoke the > > > bug. > > > > > > > > > > As far as I can tell, the only thing that can > > > > cause the backchannel to be marked as closed is a call to > > > > svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops- > > > > > close(xprt->xpt_bc_xprt). > > > > > > > > The server shouldn't be reusing anything from that svc_xprt after > > > > that. > > > > > > > > An alternative would be to ensure that setup_callback_client() > > > > > always puts xpt_bc_xprt before it invokes rpc_create(). But it > > > > > looked to me like rpc_create() already has a bunch of logic to > > > > > deal with an existing xpt_bc_xprt. > > > > > > > > The connection status is managed by the transport layer, not the > > > > RPC > > > > client layer. > > > > > > If it's a bug to mark a backchannel transport disconnected, then > > > asserts should be added to capture the problem. > > > > It isn't a bug to mark the transport as disconnected. It is a bug to > > mark it as disconnected and then to continue using it as if it were > > connected. > > Are you suggesting the server's backchannel logic should > look for -107 status after posting an RPC, and then recover > somehow? Perhaps it could put xpt_bc_xprt then set it to > NULL, and then invoke setup_callback_client() again ? > What I'm saying is that nothing should be calling ->close() on the struct rpc_xprt held by a struct svc_xprt without doing so through a call to svc_delete_xprt(), and that nothing should be calling svc_delete_xprt() without it being part of an effort to close the svc_xprt. > > > > What is the purpose of this code in rpc_create() ? > > > > > > 533 if (args->bc_xprt) { > > > 534 WARN_ON_ONCE(!(args->protocol & > > > XPRT_TRANSPORT_BC)); > > > 535 xprt = args->bc_xprt->xpt_bc_xprt; > > > 536 if (xprt) { > > > 537 xprt_get(xprt); > > > 538 return rpc_create_xprt(args, xprt); > > > 539 } > > > 540 } > > > > > > > Are you asking me or Bruce? > > Fair enough, adding Bruce. > > > > I'm not much of a fan of this code snippet either, because it doesn't > > really appear to have much to do with the normal function of > > rpc_create(). > > However as I understand it, the purpose is to create an instance of > > struct rpc_clnt when presented with an existing instance of svc_xprt. > > It does not currently modify that svc_xprt instance in any way that > > breaks layering. All it does is take a reference to the xpt_bc_xprt > > field. > > I see that the initial addition of this logic was done by > d531c008d7d9 ("NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp"). > It was later moved into rpc_create() by d50039ea5ee6 ("nfsd4/rpc: > move backchannel create logic into rpc code"). > > Probably the essential purpose of this code is explained by a comment > that was deleted by d531c008d7d9: > > - if (args->bc_xprt->xpt_bc_xprt) { > - /* > - * This server connection already has a backchannel > - * transport; we can't create a new one, as we wouldn't > - * be able to match replies based on xid any more. So, > - * reuse the already-existing one: > - */ > - return args->bc_xprt->xpt_bc_xprt; > - } > > Is this still a sensible expectation? > OK, let me clarify where my statements above are coming from and perhaps clarify my language. IMO, we need to distinguish between a closed _transport_ as understood by the RPC layer, and a closed _channel_ as understood by the NFS layer. When the transport is closed at the RPC layer, then it means that the actual transport connection needs to be broken. Both the forward and backward channels on that transport connection are affected, and the client needs to re-establish a transport connection in order to resume communication with the server. Sessions and their channels are managed by the NFS layer, and so if only the back channel is 'closed', then that is a matter for the NFS layer to deal with. It may decide that it needs to shut down the transport, or not. However if it does keep the transport open, then it needs to manage the consequences should something come in over the 'closed' back channel. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx