Re: [PATCH] NFS: enable nconnect for RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 28, 2024 at 04:35:23PM -0500, trondmy@xxxxxxxxxx wrote:
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> 
> It appears that in certain cases, RDMA capable transports can benefit
> from the ability to establish multiple connections to increase their
> throughput. This patch therefore enables the use of the "nconnect" mount
> option for those use cases.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>

No objection to this patch.

You don't mention here if you have root-caused the throughput issue.
One thing I've noticed is that contention for the transport's
queue_lock is holding back the RPC/RDMA Receive completion handler,
which is single-threaded per transport.

A way to mitigate this would be to replace the recv_queue
R-B tree with a data structure that can perform a lookup while
holding only the RCU read lock. I have experimented with using an
xarray for the recv_queue, and saw improvement.

The workload on that data structure is different for TCP versus
RDMA, though: on RDMA, the number of RPCs in flight is significantly
smaller. For tens of thousands of RPCs in flight, an xarray might
not scale well. The newer Maple tree or rose bush hash, or maybe a
more classic data structure like rhashtable, might handle this
workload better.

It's also worth considering deleting each RPC from the recv_queue
in a less performance-sensitive context, for example, xprt_release,
rather than in xprt_complete_rqst.


> ---
>  fs/nfs/nfs3client.c | 1 +
>  fs/nfs/nfs4client.c | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
> index 674c012868b1..b0c8a39c2bbd 100644
> --- a/fs/nfs/nfs3client.c
> +++ b/fs/nfs/nfs3client.c
> @@ -111,6 +111,7 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
>  	cl_init.hostname = buf;
>  
>  	switch (ds_proto) {
> +	case XPRT_TRANSPORT_RDMA:
>  	case XPRT_TRANSPORT_TCP:
>  	case XPRT_TRANSPORT_TCP_TLS:
>  		if (mds_clp->cl_nconnect > 1)
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 11e3a285594c..84573df5cf5a 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -924,6 +924,7 @@ static int nfs4_set_client(struct nfs_server *server,
>  	else
>  		cl_init.max_connect = max_connect;
>  	switch (proto) {
> +	case XPRT_TRANSPORT_RDMA:
>  	case XPRT_TRANSPORT_TCP:
>  	case XPRT_TRANSPORT_TCP_TLS:
>  		cl_init.nconnect = nconnect;
> @@ -1000,6 +1001,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv,
>  	cl_init.hostname = buf;
>  
>  	switch (ds_proto) {
> +	case XPRT_TRANSPORT_RDMA:
>  	case XPRT_TRANSPORT_TCP:
>  	case XPRT_TRANSPORT_TCP_TLS:
>  		if (mds_clp->cl_nconnect > 1) {
> -- 
> 2.44.0
> 
> 

-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux