Re: [PATCH v1 0/5] NFSv3 client RDMA multipath enhancements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Dan-


First, thanks for posting patches!


> On Jan 21, 2021, at 2:10 PM, Dan Aloni <dan@xxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> The purpose of the following changes is to allow specifying multiple
> target IP addresses in a single mount. Combining this with nconnect and
> servers that support exposing multiple ports,

"port" is probably a bad term to use here, as that term already
has a particular meaning when it comes to IP addresses. In
standards documents, we've stuck with the term "endpoint".

I worked with the IETF's nfsv4 WG a couple years ago to produce
a document that describes how we want NFS servers to advertise
their network configuration to clients.

https://datatracker.ietf.org/doc/rfc8587/

That gives a flavor for what we've done for NFSv4. IMO anything
done for NFSv3 ought to leverage similar principles and tactics.


> we can achieve load
> balancing and much greater throughput, especially on RDMA setups,
> even with the older NFSv3 protocol.

I support the basic goal of increasing transport parallelism.

As you probably became aware as you worked on these patches, the
Linux client shares one or a small set of connections across all
mount points of the same server. So a mount option that adds this
kind of control is going to be awkward.

Anna has proposed a /sys API that would enable this information to
be programmed into the kernel for all mount points sharing the
same set of connections. That would be a little nicer for building
separate administrator tools against, or even for providing an
automation mechanism (like an orchestrator) that would enable
clients to automatically fail over to a different server interface.

IMO I'd prefer to see a user space policy / tool that manages
endpoint lists and passes them to the kernel client dynamically
via Anna's API instead of adding one or more mount options, which
would be fixed for the life of the mount and shared with other
mount points that use the same transports to communicate with
the NFS server.


As far as the NUMA affinity issues go, in the past I've attempted
to provide some degree of CPU affinity between RPC Call and Reply
handling only to find that it reduced performance unacceptably.
Perhaps something that is node-aware or LLC-aware would be better
than CPU affinity, and I'm happy to discuss that and any other
ways we think can improve NFS behavior on NUMA systems. It's quite
true that RDMA transports are more sensitive to NUMA than
traditional socket-based ones.


> The changes allow specifing a new `remoteports=<IP-addresses-ranges>`
> mount option providing a group of IP addresses, from which `nconnect` at
> sunrpc scope picks target transport address in round-robin. There's also
> an accompanying `localports` parameter that allows local address bind so
> that the source port is better controlled in a way to ensure that
> transports are not hogging a single local interface.
> 
> This patchset targets the linux-next tree.
> 
> Dan Aloni (5):
>  sunrpc: Allow specifying a vector of IP addresses for nconnect
>  xprtrdma: Bind to a local address if requested
>  nfs: Extend nconnect with remoteports and localports mount params
>  sunrpc: Add srcaddr to xprt sysfs debug
>  nfs: Increase NFS_MAX_CONNECTIONS
> 
> fs/nfs/client.c                            |  24 +++
> fs/nfs/fs_context.c                        | 173 ++++++++++++++++++++-
> fs/nfs/internal.h                          |   4 +
> include/linux/nfs_fs_sb.h                  |   2 +
> include/linux/sunrpc/clnt.h                |   9 ++
> include/linux/sunrpc/xprt.h                |   1 +
> net/sunrpc/clnt.c                          |  47 ++++++
> net/sunrpc/debugfs.c                       |   8 +-
> net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   2 +-
> net/sunrpc/xprtrdma/transport.c            |  17 +-
> net/sunrpc/xprtrdma/verbs.c                |  15 +-
> net/sunrpc/xprtrdma/xprt_rdma.h            |   5 +-
> net/sunrpc/xprtsock.c                      |  49 +++---
> 13 files changed, 329 insertions(+), 27 deletions(-)
> 
> -- 
> 2.26.2
> 

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux