On Wed, Jan 20, 2021 at 10:07:37AM -0500, bfields@xxxxxxxxxxxx wrote: > On Tue, Jan 19, 2021 at 11:09:55PM +0000, Trond Myklebust wrote: > > On Tue, 2021-01-19 at 17:22 -0500, bfields@xxxxxxxxxxxx wrote: > > > On Wed, Oct 07, 2020 at 04:50:26PM +0000, Trond Myklebust wrote: > > > > As far as I can tell, this thread started with a complaint that > > > > performance suffers when we don't allow setups that hack the client > > > > by > > > > pretending that a multi-homed server is actually multiple different > > > > servers. > > > > > > > > AFAICS Tom Talpey's question is the relevant one. Why is there a > > > > performance regression being seen by these setups when they share > > > > the > > > > same connection? Is it really the connection, or is it the fact > > > > that > > > > they all share the same fixed-slot session? > > > > > > > > I did see Igor's claim that there is a QoS issue (which afaics > > > > would > > > > also affect NFSv3), but why do I care about QoS as a per-mountpoint > > > > feature? > > > > > > Sorry for being slow to get back to this. > > > > > > Some more details: > > > > > > Say an NFS server exports /data1 and /data2. > > > > > > A client mounts both. Process 'large' starts creating 10G+ files in > > > /data1, queuing up a lot of nfs WRITE rpc_tasks. > > > > > > Process 'small' creates a lot of small files in /data2, which > > > requires a > > > lot of synchronous rpc_tasks, each of which wait in line with the > > > large > > > WRITE tasks. > > > > > > The 'small' process makes painfully slow progress. > > > > > > The customer previously made things work for them by mounting two > > > different server IP addresses, so the "small" and "large" processes > > > effectively end up with their own queues. > > > > > > Frank Sorenson has a test showing the difference; see > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1703850#c42 > > > https://bugzilla.redhat.com/show_bug.cgi?id=1703850#c43 > > > > > > In that test, the "small" process creates files at a rate thousands > > > of > > > times slower when the "large" process is also running. > > > > > > Any suggestions? > > > > > > > I don't see how this answers my questions above? > > So mainly: > > > > > Why is there a performance regression being seen by these setups > > > > when they share the same connection? Is it really the connection, > > > > or is it the fact that they all share the same fixed-slot session? > > I don't know. Any pointers how we might go about finding the answer? I set this aside and then get bugged about it again. I apologize, I don't understand what you're asking for here, but it seemed obvious to you and Tom, so I'm sure the problem is me. Are you free for a call sometime maybe? Or do you have any suggestions for how you'd go about investigating this? Would it be worth experimenting with giving some sort of advantage to readers? (E.g., reserving a few slots for reads and getattrs and such?) --b. > It's easy to test the case of entirely seperate state & tcp connections. > > If we want to test with a shared connection but separate slots I guess > we'd need to create a separate session for each nfs4_server, and a lot > of functions that currently take an nfs4_client would need to take an > nfs4_server? > > --b.