Re: [PATCH] NFS: enable nconnect for RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Mar 4, 2024, at 2:01 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> On Sun, Mar 3, 2024 at 1:35 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> On Wed, Feb 28, 2024 at 04:35:23PM -0500, trondmy@xxxxxxxxxx wrote:
>>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>> 
>>> It appears that in certain cases, RDMA capable transports can benefit
>>> from the ability to establish multiple connections to increase their
>>> throughput. This patch therefore enables the use of the "nconnect" mount
>>> option for those use cases.
>>> 
>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>> 
>> No objection to this patch.
>> 
>> You don't mention here if you have root-caused the throughput issue.
>> One thing I've noticed is that contention for the transport's
>> queue_lock is holding back the RPC/RDMA Receive completion handler,
>> which is single-threaded per transport.
> 
> Curious: how does a queue_lock per transport is a problem for
> nconnect? nconnect would create its own transport, would it now and so
> it would have its own queue_lock (per nconnect).

I did not mean to imply that queue_lock contention is a
problem for nconnect or would increase when there are
multiple transports.

But there is definitely lock contention between the send and
receive code paths, and that could be one source of the relief
that Trond saw by adding more transports. IMO that contention
should be addressed at some point.

I'm not asking for a change to the proposed patch. But I am
suggesting some possible future work.


>> A way to mitigate this would be to replace the recv_queue
>> R-B tree with a data structure that can perform a lookup while
>> holding only the RCU read lock. I have experimented with using an
>> xarray for the recv_queue, and saw improvement.
>> 
>> The workload on that data structure is different for TCP versus
>> RDMA, though: on RDMA, the number of RPCs in flight is significantly
>> smaller. For tens of thousands of RPCs in flight, an xarray might
>> not scale well. The newer Maple tree or rose bush hash, or maybe a
>> more classic data structure like rhashtable, might handle this
>> workload better.
>> 
>> It's also worth considering deleting each RPC from the recv_queue
>> in a less performance-sensitive context, for example, xprt_release,
>> rather than in xprt_complete_rqst.


--
Chuck Lever






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux