Re: [PATCH] NFS: enable nconnect for RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 3, 2024 at 1:35 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
> On Wed, Feb 28, 2024 at 04:35:23PM -0500, trondmy@xxxxxxxxxx wrote:
> > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> >
> > It appears that in certain cases, RDMA capable transports can benefit
> > from the ability to establish multiple connections to increase their
> > throughput. This patch therefore enables the use of the "nconnect" mount
> > option for those use cases.
> >
> > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>
> No objection to this patch.
>
> You don't mention here if you have root-caused the throughput issue.
> One thing I've noticed is that contention for the transport's
> queue_lock is holding back the RPC/RDMA Receive completion handler,
> which is single-threaded per transport.

Curious: how does a queue_lock per transport is a problem for
nconnect? nconnect would create its own transport, would it now and so
it would have its own queue_lock (per nconnect).

> A way to mitigate this would be to replace the recv_queue
> R-B tree with a data structure that can perform a lookup while
> holding only the RCU read lock. I have experimented with using an
> xarray for the recv_queue, and saw improvement.
>
> The workload on that data structure is different for TCP versus
> RDMA, though: on RDMA, the number of RPCs in flight is significantly
> smaller. For tens of thousands of RPCs in flight, an xarray might
> not scale well. The newer Maple tree or rose bush hash, or maybe a
> more classic data structure like rhashtable, might handle this
> workload better.
>
> It's also worth considering deleting each RPC from the recv_queue
> in a less performance-sensitive context, for example, xprt_release,
> rather than in xprt_complete_rqst.
>
>
> > ---
> >  fs/nfs/nfs3client.c | 1 +
> >  fs/nfs/nfs4client.c | 2 ++
> >  2 files changed, 3 insertions(+)
> >
> > diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
> > index 674c012868b1..b0c8a39c2bbd 100644
> > --- a/fs/nfs/nfs3client.c
> > +++ b/fs/nfs/nfs3client.c
> > @@ -111,6 +111,7 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
> >       cl_init.hostname = buf;
> >
> >       switch (ds_proto) {
> > +     case XPRT_TRANSPORT_RDMA:
> >       case XPRT_TRANSPORT_TCP:
> >       case XPRT_TRANSPORT_TCP_TLS:
> >               if (mds_clp->cl_nconnect > 1)
> > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > index 11e3a285594c..84573df5cf5a 100644
> > --- a/fs/nfs/nfs4client.c
> > +++ b/fs/nfs/nfs4client.c
> > @@ -924,6 +924,7 @@ static int nfs4_set_client(struct nfs_server *server,
> >       else
> >               cl_init.max_connect = max_connect;
> >       switch (proto) {
> > +     case XPRT_TRANSPORT_RDMA:
> >       case XPRT_TRANSPORT_TCP:
> >       case XPRT_TRANSPORT_TCP_TLS:
> >               cl_init.nconnect = nconnect;
> > @@ -1000,6 +1001,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv,
> >       cl_init.hostname = buf;
> >
> >       switch (ds_proto) {
> > +     case XPRT_TRANSPORT_RDMA:
> >       case XPRT_TRANSPORT_TCP:
> >       case XPRT_TRANSPORT_TCP_TLS:
> >               if (mds_clp->cl_nconnect > 1) {
> > --
> > 2.44.0
> >
> >
>
> --
> Chuck Lever
>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux