On Wed, 2018-08-08 at 15:53 -0400, Chuck Lever wrote: > Hi Trond- > > For maintainability and legibility, I'm attempting to make the > the RPC/RDMA connect logic act a little more like the TCP > connect logic. I noticed this patch: > > commit 718ba5b87343df303017585200ee182e937eabfc > Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > AuthorDate: Sun Feb 8 18:19:25 2015 -0500 > Commit: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > CommitDate: Sun Feb 8 21:47:29 2015 -0500 > > SUNRPC: Add helpers to prevent socket create from racing > > The socket lock is currently held by the task that is requesting > the > connection be established. While that is efficient in the case > where > the connection happens quickly, it is racy in the case where it > doesn't. > What we really want is for the connect helper to be able to block > access > to the socket while it is being set up. > > This patch does so by arranging to transfer the socket lock from > the > task that is requesting the connect attempt, and then releasing > that > lock once everything is done. > This scheme also gives us automatic protection against collisions > with > the RPC close code, so we can kill the cancel_delayed_work_sync() > call in xs_close(). > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > Your patch description refers to "the socket lock" but it appears > to be manipulating the transport send lock. The new helpers are > added in net/sunrpc/xprt.c, not in xprtsock.c. And, there is a > hunk that changes the generic xprt_connect() function. > > Therefore, I'm wondering if these helpers are of value also to > other transports that make use of connections. Should xprtrdma > adopt the use of these helpers too? > Heh... You're too late. ☺☺☺ I'm pretty sure that lock is one of the biggest bottlenecks in the TCP send path today and so I'm in the process of rewriting that code. The problem is this: the above lock serialises all RPC tasks that go to a single socket, which is sort of desirable. However it does so by passing the lock from rpc_task to rpc_task, and so when the lock is contended, we end up putting each task to sleep, it waits for its turn in the queue, then the task gets scheduled on the workqueue, it does its thing (encodes the XDR, and puts itself through the socket), and then wakes up the next task. IOW: each time we hand off the lock, we take a workqueue queuing hit, with the task having to wait until any other send task or reply that is ahead of it in the queue has run. We also do the XDR encoding under the lock, further adding latency... What I want to do (and am in the process of coding) is to convert that whole thing into a queue, where if the lock is contended, the incoming task simply adds itself to the queue and then puts itself to sleep. Then whoever actually first obtains the lock is tasked with trying to drain the queue by sending as many of those queued requests as will actually fit in the socket buffer. The assumption is that we can perform the XDR encoding before we enqueue the request (so we move that out of the lock too), which is usually the case, but might be a little awkward when doing RPCSEC_GSS. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥