Re: unsharing tcp connections from different NFS mounts

"bfields@xxxxxxxxxxxx" <bfields@xxxxxxxxxxxx> · Wed, 20 Jan 2021 10:07:37 -0500

On Tue, Jan 19, 2021 at 11:09:55PM +0000, Trond Myklebust wrote:
> On Tue, 2021-01-19 at 17:22 -0500, bfields@xxxxxxxxxxxx wrote:
> > On Wed, Oct 07, 2020 at 04:50:26PM +0000, Trond Myklebust wrote:
> > > As far as I can tell, this thread started with a complaint that
> > > performance suffers when we don't allow setups that hack the client
> > > by
> > > pretending that a multi-homed server is actually multiple different
> > > servers.
> > > 
> > > AFAICS Tom Talpey's question is the relevant one. Why is there a
> > > performance regression being seen by these setups when they share
> > > the
> > > same connection? Is it really the connection, or is it the fact
> > > that
> > > they all share the same fixed-slot session?
> > > 
> > > I did see Igor's claim that there is a QoS issue (which afaics
> > > would
> > > also affect NFSv3), but why do I care about QoS as a per-mountpoint
> > > feature?
> > 
> > Sorry for being slow to get back to this.
> > 
> > Some more details:
> > 
> > Say an NFS server exports /data1 and /data2.
> > 
> > A client mounts both.  Process 'large' starts creating 10G+ files in
> > /data1, queuing up a lot of nfs WRITE rpc_tasks.
> > 
> > Process 'small' creates a lot of small files in /data2, which
> > requires a
> > lot of synchronous rpc_tasks, each of which wait in line with the
> > large
> > WRITE tasks.
> > 
> > The 'small' process makes painfully slow progress.
> > 
> > The customer previously made things work for them by mounting two
> > different server IP addresses, so the "small" and "large" processes
> > effectively end up with their own queues.
> > 
> > Frank Sorenson has a test showing the difference; see
> > 
> >         https://bugzilla.redhat.com/show_bug.cgi?id=1703850#c42
> >         https://bugzilla.redhat.com/show_bug.cgi?id=1703850#c43
> > 
> > In that test, the "small" process creates files at a rate thousands
> > of
> > times slower when the "large" process is also running.
> > 
> > Any suggestions?
> > 
> 
> I don't see how this answers my questions above?

So mainly:

> > > Why is there a performance regression being seen by these setups
> > > when they share the same connection? Is it really the connection,
> > > or is it the fact that they all share the same fixed-slot session?

I don't know.  Any pointers how we might go about finding the answer?

It's easy to test the case of entirely seperate state & tcp connections.

If we want to test with a shared connection but separate slots I guess
we'd need to create a separate session for each nfs4_server, and a lot
of functions that currently take an nfs4_client would need to take an
nfs4_server?

--b.