Re: [PATCH] NFS: enable nconnect for RDMA

Chuck Lever <chuck.lever@xxxxxxxxxx> · Tue, 5 Mar 2024 09:02:25 -0500

On Mon, Mar 04, 2024 at 11:08:00PM +0000, Trond Myklebust wrote:
> On Mon, 2024-03-04 at 19:32 +0000, Chuck Lever III wrote:
> > 
> > 
> > > On Mar 4, 2024, at 2:01 PM, Olga Kornievskaia <aglo@xxxxxxxxx>
> > > wrote:
> > > 
> > > On Sun, Mar 3, 2024 at 1:35 PM Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > wrote:
> > > > 
> > > > On Wed, Feb 28, 2024 at 04:35:23PM -0500,
> > > > trondmy@xxxxxxxxxx wrote:
> > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > > 
> > > > > It appears that in certain cases, RDMA capable transports can
> > > > > benefit
> > > > > from the ability to establish multiple connections to increase
> > > > > their
> > > > > throughput. This patch therefore enables the use of the
> > > > > "nconnect" mount
> > > > > option for those use cases.
> > > > > 
> > > > > Signed-off-by: Trond Myklebust
> > > > > <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > 
> > > > No objection to this patch.
> > > > 
> > > > You don't mention here if you have root-caused the throughput
> > > > issue.
> > > > One thing I've noticed is that contention for the transport's
> > > > queue_lock is holding back the RPC/RDMA Receive completion
> > > > handler,
> > > > which is single-threaded per transport.
> > > 
> > > Curious: how does a queue_lock per transport is a problem for
> > > nconnect? nconnect would create its own transport, would it now and
> > > so
> > > it would have its own queue_lock (per nconnect).
> > 
> > I did not mean to imply that queue_lock contention is a
> > problem for nconnect or would increase when there are
> > multiple transports.
> > 
> > But there is definitely lock contention between the send and
> > receive code paths, and that could be one source of the relief
> > that Trond saw by adding more transports. IMO that contention
> > should be addressed at some point.
> > 
> > I'm not asking for a change to the proposed patch. But I am
> > suggesting some possible future work.
> > 
> 
> We were comparing NFS/RDMA performance to that of NFS/TCP, and it was
> clear that the nconnect value was giving the latter a major boost. Once
> we enabled nconnect for the RDMA channel, then the values evened out a
> lot more.
> Once we fixed the nconnect issue, what we were seeing when the RDMA
> code maxed out was actually that the CPU got pegged running the IB
> completion work queues on writes.
> 
> We can certainly look into improving the performance of
> xprt_lookup_rqst() if we have evidence that is slow, but I'm not yet
> sure that was what we were seeing.

One observation: the Receive completion handler doesn't do anything
that is CPU-intensive. If ib_comp_wq is hot, that's an indication of
lock contention.

I've found there are typically two contended locks when handling
RPC/RDMA Receive completions:

 - The workqueue pool lock. Tejun mitigated that issue in v6.7.

 - The queue_lock, as described above.

A flame graph might narrow the issue.

-- 
Chuck Lever