Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2021-08-19 at 14:34 +0000, Chuck Lever III wrote:
> 
> 
> > On Aug 19, 2021, at 10:14 AM, Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Aug 19, 2021, at 9:01 AM, Trond Myklebust
> > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
> > > > > With NFSv4.1+ on RDMA, backchannel recovery appears not to
> > > > > work.
> > > > > 
> > > > > xprt_setup_xxx_bc() is invoked by the client's first
> > > > > CREATE_SESSION
> > > > > operation, and it always marks the rpc_clnt's transport as
> > > > > connected.
> > > > > 
> > > > > On a subsequent CREATE_SESSION, if rpc_create() is called and
> > > > > xpt_bc_xprt is populated, it might not be connected (for
> > > > > instance,
> > > > > if a backchannel fault has occurred). Ensure that code path
> > > > > returns
> > > > > a connected xprt also.
> > > > > 
> > > > > Reported-by: Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx>
> > > > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > > > ---
> > > > >  net/sunrpc/clnt.c |    1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > > index 8b4de70e8ead..570480a649a3 100644
> > > > > --- a/net/sunrpc/clnt.c
> > > > > +++ b/net/sunrpc/clnt.c
> > > > > @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
> > > > > rpc_create_args *args)
> > > > >                 xprt = args->bc_xprt->xpt_bc_xprt;
> > > > >                 if (xprt) {
> > > > >                         xprt_get(xprt);
> > > > > +                       xprt_set_connected(xprt);
> > > > >                         return rpc_create_xprt(args, xprt);
> > > > >                 }
> > > > >         }
> > > > > 
> > > > > 
> > > > 
> > > > No. This is wrong. If the connection got disconnected, then the
> > > > client
> > > > needs to reconnect and build a new connection altogether. We
> > > > can't
> > > > just
> > > > make pretend that the old connection still exists.
> > > 
> > > The patch description is not clear: the client has not
> > > disconnected.
> > > The forward channel is functioning properly, and the server has
> > > set
> > > SEQ4_STATUS_BACKCHANNEL_FAULT.
> > > 
> > > To recover, the client sends a DESTROY_SESSION / CREATE_SESSION
> > > pair
> > > on the existing connection. On the server,
> > > setup_callback_client()
> > > invokes rpc_create() again -- it's this step that is failing
> > > during
> > > the second CREATE_SESSION on a connection because the old xprt
> > > is returned but it's still marked disconnected.
> > 
> > How is that happening?
> 
> The RPC client's autodisconnect is firing. This was fixed in
> 6820bf77864d
> ("svcrdma: disable timeouts on rdma backchannel").
> 
> However we're still seeing cases where backchannel recovery is not
> working. See:
> 
> https://lore.kernel.org/linux-nfs/c3e31e57-15fa-8f70-bbb8-af5452c1f5fc@xxxxxxxxxxxxxxxx/T/#t
> 
> As an experiment, we've reverted 6820bf77864d to try to provoke the
> bug.
> 
> 
> > As far as I can tell, the only thing that can
> > cause the backchannel to be marked as closed is a call to
> > svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops-
> > >close(xprt->xpt_bc_xprt).
> > 
> > The server shouldn't be reusing anything from that svc_xprt after
> > that.
> 
> > > An alternative would be to ensure that setup_callback_client()
> > > always puts xpt_bc_xprt before it invokes rpc_create(). But it
> > > looked to me like rpc_create() already has a bunch of logic to
> > > deal with an existing xpt_bc_xprt.
> > 
> > The connection status is managed by the transport layer, not the
> > RPC
> > client layer.
> 
> If it's a bug to mark a backchannel transport disconnected, then
> asserts should be added to capture the problem.

It isn't a bug to mark the transport as disconnected. It is a bug to
mark it as disconnected and then to continue using it as if it were
connected.

> 
> What is the purpose of this code in rpc_create() ?
> 
>  533         if (args->bc_xprt) {
>  534                 WARN_ON_ONCE(!(args->protocol &
> XPRT_TRANSPORT_BC));
>  535                 xprt = args->bc_xprt->xpt_bc_xprt;
>  536                 if (xprt) {
>  537                         xprt_get(xprt);
>  538                         return rpc_create_xprt(args, xprt);
>  539                 }
>  540         }
> 

Are you asking me or Bruce?

I'm not much of a fan of this code snippet either, because it doesn't
really appear to have much to do with the normal function of
rpc_create().
However as I understand it, the purpose is to create an instance of
struct rpc_clnt when presented with an existing instance of svc_xprt.
It does not currently modify that svc_xprt instance in any way that
breaks layering. All it does is take a reference to the xpt_bc_xprt
field.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux