Re: [PATCH - RFC] new "nosharetransport" option for NFS mounts.

"Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> · Tue, 9 Jul 2013 14:46:00 +0000

On Tue, 2013-07-09 at 13:22 +1000, NeilBrown wrote:
> On Mon, 8 Jul 2013 18:51:40 +0000 "Myklebust, Trond"
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> 
> > On Mon, 2013-07-08 at 09:58 +1000, NeilBrown wrote:
> > > 
> > > This patch adds a "nosharetransport" option to allow two different
> > > mounts from the same server to use different transports.
> > > If the mounts use NFSv4, or are of the same filesystem, then
> > > "nosharecache" must be used as well.
> > 
> > Won't this interfere with the recently added NFSv4 trunking detection?
> 
> Will it?  I googled around a bit but couldn't find anything that tells me
> what trunking really was in this context.  Then I found commit 05f4c350ee02 
> which makes it quite clear (thanks Chuck!).
> 
> Probably the code I wrote could interfere.
> 
> > 
> > Also, how will it work with NFSv4.1 sessions? The server will usually
> > require a BIND_CONN_TO_SESSION when new TCP connections attempt to
> > attach to an existing session.
> 
> Why would it attempt to attach to an existing session?  I would hope there
> the two different mounts with separate TCP connections would look completely
> separate - different transport, different cache, different session.
> ??

Currently we map sessions and leases 1-1. You'd have quite some work to
do to change that, and it is very unclear to me that there is any
benefit to doing so.

> > 
> > > 2/ If a very fast network is used with a many-processor client, a
> > >   single TCP connection can present a bottle neck which reduces total
> > >   throughput.  Using multiple TCP connections (one per mount) removes
> > >   the bottleneck.
> > >   An alternate workaround is to configure multiple virtual IP
> > >   addresses on the server and mount each filesystem from a different
> > >   IP.  This is effective (throughput goes up) but an unnecessary
> > >   administrative burden.
> > 
> > As I understand it, using multiple simultaneous TCP connections between
> > the same endpoints also adds a risk that the congestion windows will
> > interfere. Do you have numbers to back up the claim of a performance
> > improvement?
> 
> A customer upgraded from SLES10 (2.6.16 based) to SLES11 (3.0 based) and saw
> a slowdown on some large DB jobs of between 1.5 and 2 times (i.e. total time
> 150% to 200% of what is was before).
> After some analysis they created multiple virtual IPs on the server and
> mounted the several filesystem each from different IPs and got the
> performance back (they see this as a work-around rather than a genuine
> solution).
> Numbers are like "500MB/s on a single connection, 850MB/sec peaking to
> 1000MB/sec on multiple connections".
> 
> If I can get something more concrete I'll let you know.
> 
> As this worked well in 2.6.16 (which doesn't try to share connections) this
> is seen as a regression.
> 
> On links that are easy to saturate, congestion windows are important and
> having a single connection is probably a good idea - so the current default
> is certainly correct.
> On a 10G ethernet or infiniband connection (where the issue has been
> measured) congestion just doesn't seem to be an issue.

It would help if we can understand where the actual bottleneck is. If
this really is about lock contention, then solving that problem might
help the single mount case too...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥