Re: [PATCH v1] SUNRPC: Ensure backchannel transports are marked connected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 19, 2021, at 10:14 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Thu, 2021-08-19 at 13:16 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Aug 19, 2021, at 9:01 AM, Trond Myklebust
>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Thu, 2021-08-19 at 08:29 -0400, Chuck Lever wrote:
>>>> With NFSv4.1+ on RDMA, backchannel recovery appears not to work.
>>>> 
>>>> xprt_setup_xxx_bc() is invoked by the client's first
>>>> CREATE_SESSION
>>>> operation, and it always marks the rpc_clnt's transport as
>>>> connected.
>>>> 
>>>> On a subsequent CREATE_SESSION, if rpc_create() is called and
>>>> xpt_bc_xprt is populated, it might not be connected (for
>>>> instance,
>>>> if a backchannel fault has occurred). Ensure that code path
>>>> returns
>>>> a connected xprt also.
>>>> 
>>>> Reported-by: Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx>
>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>> ---
>>>>  net/sunrpc/clnt.c |    1 +
>>>>  1 file changed, 1 insertion(+)
>>>> 
>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>> index 8b4de70e8ead..570480a649a3 100644
>>>> --- a/net/sunrpc/clnt.c
>>>> +++ b/net/sunrpc/clnt.c
>>>> @@ -535,6 +535,7 @@ struct rpc_clnt *rpc_create(struct
>>>> rpc_create_args *args)
>>>>                 xprt = args->bc_xprt->xpt_bc_xprt;
>>>>                 if (xprt) {
>>>>                         xprt_get(xprt);
>>>> +                       xprt_set_connected(xprt);
>>>>                         return rpc_create_xprt(args, xprt);
>>>>                 }
>>>>         }
>>>> 
>>>> 
>>> 
>>> No. This is wrong. If the connection got disconnected, then the
>>> client
>>> needs to reconnect and build a new connection altogether. We can't
>>> just
>>> make pretend that the old connection still exists.
>> 
>> The patch description is not clear: the client has not disconnected.
>> The forward channel is functioning properly, and the server has set
>> SEQ4_STATUS_BACKCHANNEL_FAULT.
>> 
>> To recover, the client sends a DESTROY_SESSION / CREATE_SESSION pair
>> on the existing connection. On the server, setup_callback_client()
>> invokes rpc_create() again -- it's this step that is failing during
>> the second CREATE_SESSION on a connection because the old xprt
>> is returned but it's still marked disconnected.
> 
> How is that happening?

The RPC client's autodisconnect is firing. This was fixed in 6820bf77864d
("svcrdma: disable timeouts on rdma backchannel").

However we're still seeing cases where backchannel recovery is not
working. See:

https://lore.kernel.org/linux-nfs/c3e31e57-15fa-8f70-bbb8-af5452c1f5fc@xxxxxxxxxxxxxxxx/T/#t

As an experiment, we've reverted 6820bf77864d to try to provoke the bug.


> As far as I can tell, the only thing that can
> cause the backchannel to be marked as closed is a call to
> svc_delete_xprt(), which explicitly calls xprt->xpt_bc_xprt->ops->close(xprt->xpt_bc_xprt).
> 
> The server shouldn't be reusing anything from that svc_xprt after that.

>> An alternative would be to ensure that setup_callback_client()
>> always puts xpt_bc_xprt before it invokes rpc_create(). But it
>> looked to me like rpc_create() already has a bunch of logic to
>> deal with an existing xpt_bc_xprt.
> 
> The connection status is managed by the transport layer, not the RPC
> client layer.

If it's a bug to mark a backchannel transport disconnected, then
asserts should be added to capture the problem.

What is the purpose of this code in rpc_create() ?

 533         if (args->bc_xprt) {
 534                 WARN_ON_ONCE(!(args->protocol & XPRT_TRANSPORT_BC));
 535                 xprt = args->bc_xprt->xpt_bc_xprt;
 536                 if (xprt) {
 537                         xprt_get(xprt);
 538                         return rpc_create_xprt(args, xprt);
 539                 }
 540         }


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux