On Sat, 2024-01-13 at 16:10 +0000, Chuck Lever III wrote: > > > On Jan 13, 2024, at 10:09 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Sat, 2024-01-13 at 15:47 +0100, Roland Mainz wrote: > > > > > > On Sat, Jan 13, 2024 at 1:19 AM Dan Shelton <dan.f.shelton@xxxxxxxxx> wrote: > > > > We've been experiencing significant nfsd performance problems with a > > > > customer who has a deeply nested filesystem hierarchy, lots of > > > > subdirs, some of them 60-80 dirs deep (!!), which leads to an > > > > exponentially slowdown with nfsd accesses. > > > > > > > > Some of the issues have been addressed by implementing a better > > > > directory walker via multiple dir fds and openat() (instead of just > > > > cwd+open()), but the nfsd side still was a pretty dramatic issue, > > > > until we bumped #define NFSD_MAX_OPS_PER_COMPOUND in > > > > linux-6.7/fs/nfsd/nfsd.h from 50 to 96. After that the nfsd side > > > > behaved MUCH more performant. > > > > > > More general question: > > > Is it feasible to turn the values for NFSD_MAX_* (max_ops, > > > max_req etc., e.g. everything which is being negotiated in a NFSv4.1 > > > session) into tuneables, which are set at nfsd startup ? It might help > > > with Dan's scenario, benchmarking, client testing (e.g. my case, where > > > I switched to nfs4j) and tuning... > > > > > > > (re-cc'ing the mailing list...) > > > > We generally don't like to add knobs like this when we can get by with > > just tuning a sane value for everyone. This particular value governs the > > maximum number of operations per compound. I don't see any value in > > keeping it artificially low. > > > > The only real argument against it that I can see is that it might make > > it easier for a malicious or badly-designed client to DoS the server. > > That's certainly something we should be wary of, but I don't expect that > > increasing the max from 50 to ~100 will make a big difference there. > > The server allocates memory and other resources based on the > largest COMPOUND it expects. > > If we crank the maximum number, it has an impact on server > resource utilization. In particular, those extra COMPOUND > slots will almost never be used except in a handful of corner > cases. > > Plus, this becomes a race against applications and workloads > that try to consume past that limit. We bump it, they use > more and hit the new limit. We bump it, lather, rinse, > repeat. > > Indeed, if we increase that value enough, it does become a > server DoS vector by tying up all available nfsd threads > trying to execute enormous COMPOUNDs. > > Upshot is I'm not in favor of increasing the max-ops limit or > making it tunable, unless we have grossly misunderstood the > issue. > Does it? The only thing that I could see that scales directly with that value is the size of struct nfsd_genl_rqstp. That's just part of the new netlink stats interface, so I don't see that as a show stopper. Am I missing something else that scales directly with NFSD_MAX_OPS_PER_COMPOUND? > > > > Solaris 11 is known to send COMPOUNDs that are too large > > > during mount, but the rest of the time these three client > > > implementations are not known to send large COMPOUNDs. > > Actually the FreeBSD client is the same as Solaris, in that it does the > > entire mount path in one compound. If you were to attempt a mount > > with more than 48 components, it would exceed 50 ops in the compound. > > I don't think it can exceed 50 ops any other way. > > > I'd like to see the raw packet captures to confirm that our > speculation about the problem is indeed correct. Since this > limit is hit only when mounting (and not at all by Linux > clients), I don't yet see how that would "make NFSD slow". > It seems quite plausible that keeping the max low causes the client to have to do a deep pathwalk using multiple RPCs instead of one. That seems like it could have performance implications. > > > I guess your clients are trying to do a long pathwalk in a single > > > COMPOUND? > > > > Is there a problem with that (assuming NFSv4.1 session limits are honored) ? > > Yes: very clearly the client will hit a rather artificial > path length limit. And the limit isn't based on the character > length of the path: the limit is hit much sooner with a path > that is constructed from a series of very short component > names, for instance. > > Good client implementations keep the number of operations per > COMPOUND limited to a small number, and break up operations > like path walks to ensure that the protocol and server > implementation do not impose any kind of application-visible > constraint. > > Sure, and good servers try their best to deal with whatever the clients throw at them. I don't really see the value in limiting the number of ops per compound. Are we really any better off having the client break those up into multiple round trips? Why? -- Jeff Layton <jlayton@xxxxxxxxxx>