Re: kernel.org list issues... / was: Fwd: Turn NFSD_MAX_* into tuneables ? / was: Re: Increasing NFSD_MAX_OPS_PER_COMPOUND to 96

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Sun, 14 Jan 2024 20:22:53 +0000

> On Jan 14, 2024, at 12:50 PM, Cedric Blancher <cedric.blancher@xxxxxxxxx> wrote:
> 
> On Sat, 13 Jan 2024 at 17:11, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> 
>>> On Jan 13, 2024, at 10:09 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>>> 
>>> On Sat, 2024-01-13 at 15:47 +0100, Roland Mainz wrote:
>>>> 
>>>> On Sat, Jan 13, 2024 at 1:19 AM Dan Shelton <dan.f.shelton@xxxxxxxxx> wrote:
>>> Is there a problem with that (assuming NFSv4.1 session limits are honored) ?
>> 
>> Yes: very clearly the client will hit a rather artificial
>> path length limit. And the limit isn't based on the character
>> length of the path: the limit is hit much sooner with a path
>> that is constructed from a series of very short component
>> names, for instance.
>> 
>> Good client implementations keep the number of operations per
>> COMPOUND limited to a small number, and break up operations
>> like path walks to ensure that the protocol and server
>> implementation do not impose any kind of application-visible
>> constraint.
> 
> This is not "good client implementation", this is bad design to force
> single operations into smaller pieces.

NFSv4 client implementers have had 20+ years to find
ways to innovate using complex COMPOUNDs, and have yet
to do so. I am not forcing any design constraint on
NFSv4 clients -- clients already work this way, because
their VFS layers have already broken up the operations
before the NFS client layer even sees them.

You can blame the design of VFS for that. It really
isn't the result of NFSv4's COMPOUND architecture.

Now, for Dan's issue:

The mean size of NFSv4 COMPOUNDs observed in packet
captures is less than 10 ops. A 50 operation max-ops
limit has zero effect on the vast majority of on-the-wire
operations from these clients. Doubling that limit will
have no impact on these operations.

We already know that Solaris and FreeBSD send large
COMPOUNDs at mount time. And in particular, Solaris
and FreeBSD clients do not walk path names as part of
OPEN, READ, or WRITE operations, since both have very
capable directory name caches. So I honestly feel that
the path name walk thing is a red herring for Dan's
issue.

If the workloads involve complex readv() and writev()
system calls, these client implementations /might/ be
building complex COMPOUNDs to handle those calls in
a single RPC. We need to see packet captures to
understand what's going on.

That is why IMO it's unwise to increase upstream's
NFSD_MAX_OPS_PER_COMPOUND value without a proper
root-cause analysis. So far I have not seen any
convincing hard data that suggests that increasing
max-ops is doing anything but masking a deeper
problem.

For Roland's client, as I said, NFSv4.1 clients
have to stay within the bounds of the server's
max-ops and clients have no control of that. NFSD
might be changed to provide a larger max-ops, but
you guys have no control over other server
implementations. The better approach is to manage
what you do have control over.

--
Chuck Lever