Re: [PATCH 0/9] Multiple network connections for a single NFS mount.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On May 30, 2019, at 6:56 PM, NeilBrown <neilb@xxxxxxxx> wrote:
> 
> On Thu, May 30 2019, Chuck Lever wrote:
> 
>> Hi Neil-
>> 
>> Thanks for chasing this a little further.
>> 
>> 
>>> On May 29, 2019, at 8:41 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>>> 
>>> This patch set is based on the patches in the multipath_tcp branch of
>>> git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git
>>> 
>>> I'd like to add my voice to those supporting this work and wanting to
>>> see it land.
>>> We have had customers/partners wanting this sort of functionality for
>>> years.  In SLES releases prior to SLE15, we've provide a
>>> "nosharetransport" mount option, so that several filesystem could be
>>> mounted from the same server and each would get its own TCP
>>> connection.
>> 
>> Is it well understood why splitting up the TCP connections result
>> in better performance?
>> 
>> 
>>> In SLE15 we are using this 'nconnect' feature, which is much nicer.
>>> 
>>> Partners have assured us that it improves total throughput,
>>> particularly with bonded networks, but we haven't had any concrete
>>> data until Olga Kornievskaia provided some concrete test data - thanks
>>> Olga!
>>> 
>>> My understanding, as I explain in one of the patches, is that parallel
>>> hardware is normally utilized by distributing flows, rather than
>>> packets.  This avoid out-of-order deliver of packets in a flow.
>>> So multiple flows are needed to utilizes parallel hardware.
>> 
>> Indeed.
>> 
>> However I think one of the problems is what happens in simpler scenarios.
>> We had reports that using nconnect > 1 on virtual clients made things
>> go slower. It's not always wise to establish multiple connections
>> between the same two IP addresses. It depends on the hardware on each
>> end, and the network conditions.
> 
> This is a good argument for leaving the default at '1'.  When
> documentation is added to nfs(5), we can make it clear that the optimal
> number is dependant on hardware.

Is there any visibility into the NIC hardware that can guide this setting?


>> What about situations where the network capabilities between server and
>> client change? Problem is that neither endpoint can detect that; TCP
>> usually just deals with it.
> 
> Being able to manually change (-o remount) the number of connections
> might be useful...

Ugh. I have problems with the administrative interface for this feature,
and this is one of them.

Another is what prevents your client from using a different nconnect=
setting on concurrent mounts of the same server? It's another case of a
per-mount setting being used to control a resource that is shared across
mounts.

Adding user tunables has never been known to increase the aggregate
amount of happiness in the universe. I really hope we can come up with
a better administrative interface... ideally, none would be best.


>> Related Work:
>> 
>> We now have protocol (more like conventions) for clients to discover
>> when a server has additional endpoints so that it can establish
>> connections to each of them.
>> 
>> https://datatracker.ietf.org/doc/rfc8587/
>> 
>> and
>> 
>> https://datatracker.ietf.org/doc/draft-ietf-nfsv4-rfc5661-msns-update/
>> 
>> Boiled down, the client uses fs_locations and trunking detection to
>> figure out when two IP addresses are the same server instance.
>> 
>> This facility can also be used to establish a connection over a
>> different path if network connectivity is lost.
>> 
>> There has also been some exploration of MP-TCP. The magic happens
>> under the transport socket in the network layer, and the RPC client
>> is not involved.
> 
> I would think that SCTP would be the best protocol for NFS to use as it
> supports multi-streaming - several independent streams.  That would
> require that hardware understands it of course.
> 
> Though I have examined MP-TCP closely, it looks like it is still fully
> sequenced, so it would be tricky for two RPC messages to be assembled
> into TCP frames completely independently - at least you would need
> synchronization on the sequence number.
> 
> Thanks for your thoughts,
> NeilBrown
> 
> 
>> 
>> 
>>> Comments most welcome.  I'd love to see this, or something similar,
>>> merged.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> ---
>>> 
>>> NeilBrown (4):
>>>     NFS: send state management on a single connection.
>>>     SUNRPC: enhance rpc_clnt_show_stats() to report on all xprts.
>>>     SUNRPC: add links for all client xprts to debugfs
>>> 
>>> Trond Myklebust (5):
>>>     SUNRPC: Add basic load balancing to the transport switch
>>>     SUNRPC: Allow creation of RPC clients with multiple connections
>>>     NFS: Add a mount option to specify number of TCP connections to use
>>>     NFSv4: Allow multiple connections to NFSv4.x servers
>>>     pNFS: Allow multiple connections to the DS
>>>     NFS: Allow multiple connections to a NFSv2 or NFSv3 server
>>> 
>>> 
>>> fs/nfs/client.c                      |    3 +
>>> fs/nfs/internal.h                    |    2 +
>>> fs/nfs/nfs3client.c                  |    1 
>>> fs/nfs/nfs4client.c                  |   13 ++++-
>>> fs/nfs/nfs4proc.c                    |   22 +++++---
>>> fs/nfs/super.c                       |   12 ++++
>>> include/linux/nfs_fs_sb.h            |    1 
>>> include/linux/sunrpc/clnt.h          |    1 
>>> include/linux/sunrpc/sched.h         |    1 
>>> include/linux/sunrpc/xprt.h          |    1 
>>> include/linux/sunrpc/xprtmultipath.h |    2 +
>>> net/sunrpc/clnt.c                    |   98 ++++++++++++++++++++++++++++++++--
>>> net/sunrpc/debugfs.c                 |   46 ++++++++++------
>>> net/sunrpc/sched.c                   |    3 +
>>> net/sunrpc/stats.c                   |   15 +++--
>>> net/sunrpc/sunrpc.h                  |    3 +
>>> net/sunrpc/xprtmultipath.c           |   23 +++++++-
>>> 17 files changed, 204 insertions(+), 43 deletions(-)
>>> 
>>> --
>>> Signature
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux