Re: [PATCH] NFS: enable nconnect for RDMA

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 4 Mar 2024 23:08:00 +0000

On Mon, 2024-03-04 at 19:32 +0000, Chuck Lever III wrote:
> 
> 
> > On Mar 4, 2024, at 2:01 PM, Olga Kornievskaia <aglo@xxxxxxxxx>
> > wrote:
> > 
> > On Sun, Mar 3, 2024 at 1:35 PM Chuck Lever <chuck.lever@xxxxxxxxxx>
> > wrote:
> > > 
> > > On Wed, Feb 28, 2024 at 04:35:23PM -0500,
> > > trondmy@xxxxxxxxxx wrote:
> > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > 
> > > > It appears that in certain cases, RDMA capable transports can
> > > > benefit
> > > > from the ability to establish multiple connections to increase
> > > > their
> > > > throughput. This patch therefore enables the use of the
> > > > "nconnect" mount
> > > > option for those use cases.
> > > > 
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@xxxxxxxxxxxxxxx>
> > > 
> > > No objection to this patch.
> > > 
> > > You don't mention here if you have root-caused the throughput
> > > issue.
> > > One thing I've noticed is that contention for the transport's
> > > queue_lock is holding back the RPC/RDMA Receive completion
> > > handler,
> > > which is single-threaded per transport.
> > 
> > Curious: how does a queue_lock per transport is a problem for
> > nconnect? nconnect would create its own transport, would it now and
> > so
> > it would have its own queue_lock (per nconnect).
> 
> I did not mean to imply that queue_lock contention is a
> problem for nconnect or would increase when there are
> multiple transports.
> 
> But there is definitely lock contention between the send and
> receive code paths, and that could be one source of the relief
> that Trond saw by adding more transports. IMO that contention
> should be addressed at some point.
> 
> I'm not asking for a change to the proposed patch. But I am
> suggesting some possible future work.
> 

We were comparing NFS/RDMA performance to that of NFS/TCP, and it was
clear that the nconnect value was giving the latter a major boost. Once
we enabled nconnect for the RDMA channel, then the values evened out a
lot more.
Once we fixed the nconnect issue, what we were seeing when the RDMA
code maxed out was actually that the CPU got pegged running the IB
completion work queues on writes.

We can certainly look into improving the performance of
xprt_lookup_rqst() if we have evidence that is slow, but I'm not yet
sure that was what we were seeing.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx