On Tue, Jan 19, 2021 at 11:09:55PM +0000, Trond Myklebust wrote: > On Tue, 2021-01-19 at 17:22 -0500, bfields@xxxxxxxxxxxx wrote: > > On Wed, Oct 07, 2020 at 04:50:26PM +0000, Trond Myklebust wrote: > > > As far as I can tell, this thread started with a complaint that > > > performance suffers when we don't allow setups that hack the client > > > by > > > pretending that a multi-homed server is actually multiple different > > > servers. > > > > > > AFAICS Tom Talpey's question is the relevant one. Why is there a > > > performance regression being seen by these setups when they share > > > the > > > same connection? Is it really the connection, or is it the fact > > > that > > > they all share the same fixed-slot session? > > > > > > I did see Igor's claim that there is a QoS issue (which afaics > > > would > > > also affect NFSv3), but why do I care about QoS as a per-mountpoint > > > feature? > > > > Sorry for being slow to get back to this. > > > > Some more details: > > > > Say an NFS server exports /data1 and /data2. > > > > A client mounts both. Process 'large' starts creating 10G+ files in > > /data1, queuing up a lot of nfs WRITE rpc_tasks. > > > > Process 'small' creates a lot of small files in /data2, which > > requires a > > lot of synchronous rpc_tasks, each of which wait in line with the > > large > > WRITE tasks. > > > > The 'small' process makes painfully slow progress. > > > > The customer previously made things work for them by mounting two > > different server IP addresses, so the "small" and "large" processes > > effectively end up with their own queues. > > > > Frank Sorenson has a test showing the difference; see > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1703850#c42 > > https://bugzilla.redhat.com/show_bug.cgi?id=1703850#c43 > > > > In that test, the "small" process creates files at a rate thousands > > of > > times slower when the "large" process is also running. > > > > Any suggestions? > > > > I don't see how this answers my questions above? So mainly: > > > Why is there a performance regression being seen by these setups > > > when they share the same connection? Is it really the connection, > > > or is it the fact that they all share the same fixed-slot session? I don't know. Any pointers how we might go about finding the answer? It's easy to test the case of entirely seperate state & tcp connections. If we want to test with a shared connection but separate slots I guess we'd need to create a separate session for each nfs4_server, and a lot of functions that currently take an nfs4_client would need to take an nfs4_server? --b.