Re: RFC: xprt_lock_connect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 8, 2018, at 5:19 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Wed, 2018-08-08 at 15:53 -0400, Chuck Lever wrote:
>> Hi Trond-
>> 
>> For maintainability and legibility, I'm attempting to make the
>> the RPC/RDMA connect logic act a little more like the TCP
>> connect logic. I noticed this patch:
>> 
>> commit 718ba5b87343df303017585200ee182e937eabfc
>> Author:     Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>> AuthorDate: Sun Feb 8 18:19:25 2015 -0500
>> Commit:     Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>> CommitDate: Sun Feb 8 21:47:29 2015 -0500
>> 
>>    SUNRPC: Add helpers to prevent socket create from racing
>> 
>>    The socket lock is currently held by the task that is requesting
>> the
>>    connection be established. While that is efficient in the case
>> where
>>    the connection happens quickly, it is racy in the case where it
>> doesn't.
>>    What we really want is for the connect helper to be able to block
>> access
>>    to the socket while it is being set up.
>> 
>>    This patch does so by arranging to transfer the socket lock from
>> the
>>    task that is requesting the connect attempt, and then releasing
>> that
>>    lock once everything is done.
>>    This scheme also gives us automatic protection against collisions
>> with
>>    the RPC close code, so we can kill the cancel_delayed_work_sync()
>>    call in xs_close().
>> 
>>    Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>> 
>> Your patch description refers to "the socket lock" but it appears
>> to be manipulating the transport send lock. The new helpers are
>> added in net/sunrpc/xprt.c, not in xprtsock.c. And, there is a
>> hunk that changes the generic xprt_connect() function.
>> 
>> Therefore, I'm wondering if these helpers are of value also to
>> other transports that make use of connections. Should xprtrdma
>> adopt the use of these helpers too?
>> 
> 
> Heh... You're too late. ☚☚☚
> 
> I'm pretty sure that lock is one of the biggest bottlenecks in the TCP
> send path today and so I'm in the process of rewriting that code.

> The problem is this: the above lock serialises all RPC tasks that go to
> a single socket, which is sort of desirable. However it does so by
> passing the lock from rpc_task to rpc_task, and so when the lock is
> contended, we end up putting each task to sleep, it waits for its turn
> in the queue, then the task gets scheduled on the workqueue, it does
> its thing (encodes the XDR, and puts itself through the socket), and
> then wakes up the next task.
> IOW: each time we hand off the lock, we take a workqueue queuing hit,
> with the task having to wait until any other send task or reply that is
> ahead of it in the queue has run. We also do the XDR encoding under the
> lock, further adding latency...

Indeed, RPC wake-ups are a major, if not the primary, limiter on RPC Call
rate on a contended transport. The transport send lock is a top source
of sleeps and wake-ups. The backlog queue is similar.

Fewer wake-ups will help reduce both per-RPC latency and latency variance.


> What I want to do (and am in the process of coding) is to convert that
> whole thing into a queue, where if the lock is contended, the incoming
> task simply adds itself to the queue and then puts itself to sleep.
> Then whoever actually first obtains the lock is tasked with trying to
> drain the queue by sending as many of those queued requests as will
> actually fit in the socket buffer.

I've also been exploring xprt reserve/release replacements for xprtrdma,
which needs a new reserve/release mechanism for a lot of reasons.

For instance, note the architectural differences:

 - RPC/TCP : stream-oriented, connection-oriented, no congestion control
 - RPC/UDP : datagram-oriented, not connection-oriented, needs congestion control
 - RPC/RDMA : datagram-oriented, connection-oriented, needs congestion control

xprtrdma currently uses the UDP congestion control scheme, but it's not a
perfect fit. rq_cong does not work very well after a reconnect where the
congestion accounting has to be reset during connection establishment, for
instance.

What changes do you intend to make to the UDP congestion control mechanism?

Would it be helpful to split out xprtrdma's reserve/release first before
going down this road?


> The assumption is that we can perform the XDR encoding before we
> enqueue the request (so we move that out of the lock too), which is
> usually the case, but might be a little awkward when doing RPCSEC_GSS.

The GSS sequence number will be an interesting challenge.

As you consider the internal APIs, please keep in mind issues related to
XDR encoding for xprtrdma:

- Today, xprtrdma schedules memory registration and constructs the
transport header in ->send_request, while the transport send lock is held.
For a retransmit on a new connection, RPC/RDMA requires that memory be
re-registered, for example, which means the transport header needs to be
constructed again (new R_keys).

- Scheduling memory registration outside the transport send lock might be
beneficial, but xprtrdma relies on this architecture to avoid fine-grained
locking of transport resources used while registering memory and constructing
the transport header.


--
Chuck Lever


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux