Re: [PATCH v4 06/30] xprtrdma: Don't wake pending tasks until disconnect is done

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Dec 17, 2018, at 1:55 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Mon, 2018-12-17 at 13:37 -0500, Chuck Lever wrote:
>>> On Dec 17, 2018, at 12:28 PM, Trond Myklebust <
>>> trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, 2018-12-17 at 11:39 -0500, Chuck Lever wrote:
>>>> Transport disconnect processing does a "wake pending tasks" at
>>>> various points.
>>>> 
>>>> Suppose an RPC Reply is being processed. The RPC task that Reply
>>>> goes with is waiting on the pending queue. If a disconnect wake-
>>>> up
>>>> happens before reply processing is done, that reply, even if it
>>>> is
>>>> good, is thrown away, and the RPC has to be sent again.
>>>> 
>>>> This window apparently does not exist for socket transports
>>>> because
>>>> there is a lock held while a reply is being received which
>>>> prevents
>>>> the wake-up call until after reply processing is done.
>>>> 
>>>> To resolve this, all RPC replies being processed on an RPC-over-
>>>> RDMA
>>>> transport have to complete before pending tasks are awoken due to
>>>> a
>>>> transport disconnect.
>>>> 
>>>> Callers that already hold the transport write lock may invoke
>>>> ->ops->close directly. Others use a generic helper that schedules
>>>> a close when the write lock can be taken safely.
>>>> 
>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>> ---
>>>> include/linux/sunrpc/xprt.h                |    1 +
>>>> net/sunrpc/xprt.c                          |   19
>>>> +++++++++++++++++++
>>>> net/sunrpc/xprtrdma/backchannel.c          |   13 +++++++------
>>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    8 +++++---
>>>> net/sunrpc/xprtrdma/transport.c            |   16 ++++++++++-----
>>>> -
>>>> net/sunrpc/xprtrdma/verbs.c                |    5 ++---
>>>> 6 files changed, 44 insertions(+), 18 deletions(-)
>>>> 
>>>> diff --git a/include/linux/sunrpc/xprt.h
>>>> b/include/linux/sunrpc/xprt.h
>>>> index a4ab4f8..ee94ed0 100644
>>>> --- a/include/linux/sunrpc/xprt.h
>>>> +++ b/include/linux/sunrpc/xprt.h
>>>> @@ -401,6 +401,7 @@ static inline __be32
>>>> *xprt_skip_transport_header(struct rpc_xprt *xprt, __be32 *
>>>> bool			xprt_request_get_cong(struct rpc_xprt
>>>> *xprt,
>>>> struct rpc_rqst *req);
>>>> void			xprt_disconnect_done(struct rpc_xprt
>>>> *xprt);
>>>> void			xprt_force_disconnect(struct rpc_xprt
>>>> *xprt);
>>>> +void			xprt_disconnect_nowake(struct rpc_xprt
>>>> *xprt);
>>>> void			xprt_conditional_disconnect(struct
>>>> rpc_xprt
>>>> *xprt, unsigned int cookie);
>>>> 
>>>> bool			xprt_lock_connect(struct rpc_xprt *,
>>>> struct
>>>> rpc_task *, void *);
>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>>> index ce92700..afe412e 100644
>>>> --- a/net/sunrpc/xprt.c
>>>> +++ b/net/sunrpc/xprt.c
>>>> @@ -685,6 +685,25 @@ void xprt_force_disconnect(struct rpc_xprt
>>>> *xprt)
>>>> }
>>>> EXPORT_SYMBOL_GPL(xprt_force_disconnect);
>>>> 
>>>> +/**
>>>> + * xprt_disconnect_nowake - force a call to xprt->ops->close
>>>> + * @xprt: transport to disconnect
>>>> + *
>>>> + * The caller must ensure that xprt_wake_pending_tasks() is
>>>> + * called later.
>>>> + */
>>>> +void xprt_disconnect_nowake(struct rpc_xprt *xprt)
>>>> +{
>>>> +       /* Don't race with the test_bit() in xprt_clear_locked()
>>>> */
>>>> +       spin_lock_bh(&xprt->transport_lock);
>>>> +       set_bit(XPRT_CLOSE_WAIT, &xprt->state);
>>>> +       /* Try to schedule an autoclose RPC call */
>>>> +       if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0)
>>>> +               queue_work(xprtiod_workqueue, &xprt-
>>>>> task_cleanup);
>>>> +       spin_unlock_bh(&xprt->transport_lock);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(xprt_disconnect_nowake);
>>>> +
>>> 
>>> We shouldn't need both xprt_disconnect_nowake() and
>>> xprt_force_disconnect() to be exported given that you can build the
>>> latter from the former + xprt_wake_pending_tasks() (which is also
>>> already exported).
>> 
>> Thanks for your review!
>> 
>> I can get rid of xprt_disconnect_nowake. There are some variations,
>> depending on why wake_pending_tasks is protected by xprt-
>>> transport_lock.
> 
> I'm having some second thoughts about the patch that Scott sent out
> last week to fix the issue that Dave and he were seeing. I think that
> what we really need to do to fix his issue is to call
> xprt_disconnect_done() after we've released the TCP socket.
> 
> Given that realisation, I think that we can fix up
> xprt_force_disconnect() to only wake up the task that holds the
> XPRT_LOCKED instead of doing a thundering herd wakeup like we do today.
> That should (I think) obviate the need for a separate
> xprt_disconnect_nowake().

For RPC-over-RDMA, there really is a dangerous race between the waking
task(s) and work being done by the deferred RPC completion handler. IMO
the only safe thing RPC-over-RDMA can do is not wake anything until the
deferred queue is well and truly drained.

As you observed when we last spoke, socket transports are already
protected from this race by the socket lock.... RPC-over-RDMA is going
to have to be more careful.


> A patch is forthcoming later today. I'll make sure you are Cced so you
> can comment.
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx

--
Chuck Lever







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux