On Tue, 2024-11-19 at 16:23 +0000, Chuck Lever III wrote: > > > On Nov 19, 2024, at 10:09 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Tue, 2024-11-19 at 06:45 -0500, Jeff Layton wrote: > > > We attempted to implement the "delstid" draft for v6.13, but have had > > > to drop the patches for it. After merge, we got a couple of reports > > > of > > > a performance issue due to the OPEN_XOR_DELEGATION patch: > > > > > > > > > https://lore.kernel.org/linux-nfs/202409161645.d44bced5-oliver.sang@xxxxxxxxx/ > > > > > > Once we enable OPEN_XOR_DELEGATION support, the fsmark "App Overhead" > > > statistic spikes significantly. The kernel patch for this is very > > > simple, and doesn't seem likely to cause a performance issue on its > > > own. My theory is that this test is one that causes the client to > > > return the delegation, and since it doesn't have an open stateid, it > > > has to reestablish one during the test run, and that causes the app > > > overhead stat to spike. > > > > > > Trond, Tom, Mike -- I know that the HS Anvil has support for > > > OPEN_XOR_DELEGATION. If you run the fsmark test against it with that > > > support both enabled and disabled (either on the client or server > > > side), do you see a similar spike in "App Overhead"? > > > > > > If so, then I suspect we need to consider limiting the use of that > > > flag > > > in some cases. I have no idea what heuristic we'd use to decide this > > > though. > > > > As already stated when we discussed this at Bakeathon: the server is > > still in charge of heuristics w.r.t. whether or not there may be > > contention for the file. The OPEN_XOR_DELEGATION flag changes nothing > > in that respect. > > fsmark is a single-client test. There should be no contention > for any files during this test. > > > > Yes, I'm sure you can find tests which cause recalls of delegations, > > and those will be marginally slower when the client has to re-establish > > an open stateid. > > The fsmark result regressed 92%. > To be clear, the fsmark "App Overhead" regressed 92%. Which has a curious definition: "App overhead is time in microseconds spent in the test not doing file writing related system calls." It encompasses a bunch of different test setup stuff, and it's been difficult to nail down the part that is slower. See Oliver's email here: https://lore.kernel.org/linux-nfs/ZwTm4e5JxOOJc7JC@xsang-OptiPlex-9020/ -- Jeff Layton <jlayton@xxxxxxxxxx>