> On May 30, 2019, at 6:56 PM, NeilBrown <neilb@xxxxxxxx> wrote: > > On Thu, May 30 2019, Chuck Lever wrote: > >> Hi Neil- >> >> Thanks for chasing this a little further. >> >> >>> On May 29, 2019, at 8:41 PM, NeilBrown <neilb@xxxxxxxx> wrote: >>> >>> This patch set is based on the patches in the multipath_tcp branch of >>> git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git >>> >>> I'd like to add my voice to those supporting this work and wanting to >>> see it land. >>> We have had customers/partners wanting this sort of functionality for >>> years. In SLES releases prior to SLE15, we've provide a >>> "nosharetransport" mount option, so that several filesystem could be >>> mounted from the same server and each would get its own TCP >>> connection. >> >> Is it well understood why splitting up the TCP connections result >> in better performance? >> >> >>> In SLE15 we are using this 'nconnect' feature, which is much nicer. >>> >>> Partners have assured us that it improves total throughput, >>> particularly with bonded networks, but we haven't had any concrete >>> data until Olga Kornievskaia provided some concrete test data - thanks >>> Olga! >>> >>> My understanding, as I explain in one of the patches, is that parallel >>> hardware is normally utilized by distributing flows, rather than >>> packets. This avoid out-of-order deliver of packets in a flow. >>> So multiple flows are needed to utilizes parallel hardware. >> >> Indeed. >> >> However I think one of the problems is what happens in simpler scenarios. >> We had reports that using nconnect > 1 on virtual clients made things >> go slower. It's not always wise to establish multiple connections >> between the same two IP addresses. It depends on the hardware on each >> end, and the network conditions. > > This is a good argument for leaving the default at '1'. When > documentation is added to nfs(5), we can make it clear that the optimal > number is dependant on hardware. Is there any visibility into the NIC hardware that can guide this setting? >> What about situations where the network capabilities between server and >> client change? Problem is that neither endpoint can detect that; TCP >> usually just deals with it. > > Being able to manually change (-o remount) the number of connections > might be useful... Ugh. I have problems with the administrative interface for this feature, and this is one of them. Another is what prevents your client from using a different nconnect= setting on concurrent mounts of the same server? It's another case of a per-mount setting being used to control a resource that is shared across mounts. Adding user tunables has never been known to increase the aggregate amount of happiness in the universe. I really hope we can come up with a better administrative interface... ideally, none would be best. >> Related Work: >> >> We now have protocol (more like conventions) for clients to discover >> when a server has additional endpoints so that it can establish >> connections to each of them. >> >> https://datatracker.ietf.org/doc/rfc8587/ >> >> and >> >> https://datatracker.ietf.org/doc/draft-ietf-nfsv4-rfc5661-msns-update/ >> >> Boiled down, the client uses fs_locations and trunking detection to >> figure out when two IP addresses are the same server instance. >> >> This facility can also be used to establish a connection over a >> different path if network connectivity is lost. >> >> There has also been some exploration of MP-TCP. The magic happens >> under the transport socket in the network layer, and the RPC client >> is not involved. > > I would think that SCTP would be the best protocol for NFS to use as it > supports multi-streaming - several independent streams. That would > require that hardware understands it of course. > > Though I have examined MP-TCP closely, it looks like it is still fully > sequenced, so it would be tricky for two RPC messages to be assembled > into TCP frames completely independently - at least you would need > synchronization on the sequence number. > > Thanks for your thoughts, > NeilBrown > > >> >> >>> Comments most welcome. I'd love to see this, or something similar, >>> merged. >>> >>> Thanks, >>> NeilBrown >>> >>> --- >>> >>> NeilBrown (4): >>> NFS: send state management on a single connection. >>> SUNRPC: enhance rpc_clnt_show_stats() to report on all xprts. >>> SUNRPC: add links for all client xprts to debugfs >>> >>> Trond Myklebust (5): >>> SUNRPC: Add basic load balancing to the transport switch >>> SUNRPC: Allow creation of RPC clients with multiple connections >>> NFS: Add a mount option to specify number of TCP connections to use >>> NFSv4: Allow multiple connections to NFSv4.x servers >>> pNFS: Allow multiple connections to the DS >>> NFS: Allow multiple connections to a NFSv2 or NFSv3 server >>> >>> >>> fs/nfs/client.c | 3 + >>> fs/nfs/internal.h | 2 + >>> fs/nfs/nfs3client.c | 1 >>> fs/nfs/nfs4client.c | 13 ++++- >>> fs/nfs/nfs4proc.c | 22 +++++--- >>> fs/nfs/super.c | 12 ++++ >>> include/linux/nfs_fs_sb.h | 1 >>> include/linux/sunrpc/clnt.h | 1 >>> include/linux/sunrpc/sched.h | 1 >>> include/linux/sunrpc/xprt.h | 1 >>> include/linux/sunrpc/xprtmultipath.h | 2 + >>> net/sunrpc/clnt.c | 98 ++++++++++++++++++++++++++++++++-- >>> net/sunrpc/debugfs.c | 46 ++++++++++------ >>> net/sunrpc/sched.c | 3 + >>> net/sunrpc/stats.c | 15 +++-- >>> net/sunrpc/sunrpc.h | 3 + >>> net/sunrpc/xprtmultipath.c | 23 +++++++- >>> 17 files changed, 204 insertions(+), 43 deletions(-) >>> >>> -- >>> Signature >>> >> >> -- >> Chuck Lever -- Chuck Lever