On Sun, Mar 3, 2024 at 1:35 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On Wed, Feb 28, 2024 at 04:35:23PM -0500, trondmy@xxxxxxxxxx wrote: > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > It appears that in certain cases, RDMA capable transports can benefit > > from the ability to establish multiple connections to increase their > > throughput. This patch therefore enables the use of the "nconnect" mount > > option for those use cases. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > No objection to this patch. > > You don't mention here if you have root-caused the throughput issue. > One thing I've noticed is that contention for the transport's > queue_lock is holding back the RPC/RDMA Receive completion handler, > which is single-threaded per transport. Curious: how does a queue_lock per transport is a problem for nconnect? nconnect would create its own transport, would it now and so it would have its own queue_lock (per nconnect). > A way to mitigate this would be to replace the recv_queue > R-B tree with a data structure that can perform a lookup while > holding only the RCU read lock. I have experimented with using an > xarray for the recv_queue, and saw improvement. > > The workload on that data structure is different for TCP versus > RDMA, though: on RDMA, the number of RPCs in flight is significantly > smaller. For tens of thousands of RPCs in flight, an xarray might > not scale well. The newer Maple tree or rose bush hash, or maybe a > more classic data structure like rhashtable, might handle this > workload better. > > It's also worth considering deleting each RPC from the recv_queue > in a less performance-sensitive context, for example, xprt_release, > rather than in xprt_complete_rqst. > > > > --- > > fs/nfs/nfs3client.c | 1 + > > fs/nfs/nfs4client.c | 2 ++ > > 2 files changed, 3 insertions(+) > > > > diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c > > index 674c012868b1..b0c8a39c2bbd 100644 > > --- a/fs/nfs/nfs3client.c > > +++ b/fs/nfs/nfs3client.c > > @@ -111,6 +111,7 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv, > > cl_init.hostname = buf; > > > > switch (ds_proto) { > > + case XPRT_TRANSPORT_RDMA: > > case XPRT_TRANSPORT_TCP: > > case XPRT_TRANSPORT_TCP_TLS: > > if (mds_clp->cl_nconnect > 1) > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > index 11e3a285594c..84573df5cf5a 100644 > > --- a/fs/nfs/nfs4client.c > > +++ b/fs/nfs/nfs4client.c > > @@ -924,6 +924,7 @@ static int nfs4_set_client(struct nfs_server *server, > > else > > cl_init.max_connect = max_connect; > > switch (proto) { > > + case XPRT_TRANSPORT_RDMA: > > case XPRT_TRANSPORT_TCP: > > case XPRT_TRANSPORT_TCP_TLS: > > cl_init.nconnect = nconnect; > > @@ -1000,6 +1001,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv, > > cl_init.hostname = buf; > > > > switch (ds_proto) { > > + case XPRT_TRANSPORT_RDMA: > > case XPRT_TRANSPORT_TCP: > > case XPRT_TRANSPORT_TCP_TLS: > > if (mds_clp->cl_nconnect > 1) { > > -- > > 2.44.0 > > > > > > -- > Chuck Lever >