|ca_maxoperations| - tuneable ? / was: Re: RFE: Linux nfsd's |ca_maxoperations| should be at least |64| ... / was: Re: kernel.org list issues... / was: Fwd: Turn NFSD_MAX_* into tuneables ? / was: Re: Increasing NFSD_MAX_OPS_PER_COMPOUND to 96

Roland Mainz <roland.mainz@xxxxxxxxxxx> · Sat, 16 Mar 2024 12:55:10 +0100

On Thu, Jan 18, 2024 at 3:52 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> > On Jan 18, 2024, at 4:44 AM, Martin Wege <martin.l.wege@xxxxxxxxx> wrote:
> > On Thu, Jan 18, 2024 at 2:57 AM Roland Mainz <roland.mainz@xxxxxxxxxxx> wrote:
> >> On Sat, Jan 13, 2024 at 5:10 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> >>>> On Jan 13, 2024, at 10:09 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >>>> On Sat, 2024-01-13 at 15:47 +0100, Roland Mainz wrote:
> >>>>> On Sat, Jan 13, 2024 at 1:19 AM Dan Shelton <dan.f.shelton@xxxxxxxxx> wrote:
[snip]
> >> That assumes that no process does random access into deep subdirs. In
> >> that case the performance is absolutely terrible, unless you devote
> >> lots of memory to a giant cache (which is not feasible due to cache
> >> expiration limits, unless someone (please!) finally implements
> >> directory delegations).
>
> Do you mean not feasible for your client? Lookup caches
> have been part of operating systems for decades. Solaris,
> FreeBSD, and Linux all have one. Does the Windows kernel
> have one that mfs-nfs41-client can use?

The ms-nfs41-client has its own cache.
Technically Windows has another, but that is in the kernel and
difficult to connect to the NFS client daemon without performance
issues.

[snip]
> Sending a full path in a single COMPOUND is one way to
> handle path resolution, but it has so many limitations
> that it's really not the mechanism of choice.

Which limitations ?

The reason why I am looking to stuff more info into a request:
- VPN has very high latency, so splitting requests hurts performance *BADLY*.
I've been slapped about path/dir lookup performance now many times,
and while there is more than one issue (Cygwin looks for "file" and
"file.lnk"&co for each file + our readdir implementation needs lots of
work) the biggest issue that we split requests up because they usually
do not fit.
- Windows API is async+multithreaded, which results in that requests
do not always come in the logical/expected/useful order, which leads
to cache issues.
Seriously this issue is so bad that it is worth a research paper
- Real-world paths on Windows are LONG with many subdirs, even worse
when projects and organisations change, shift, reorganise, move,
merge, split, get outsourced etc. over *DECADES*. Plus non-IT-users
have zero awareness about "path limits", and sometimes dump whole
sentences into directory names (e.g. "customer XYZ. can be ignored he
terminated the business relationship on 26 May 2001. please do not
delete dir" <----- xxx@@!!!! ).
That issue haunts us in other ways too, e.g.  in the ms-nfs41-client
project I had to extend the maximum supported path length multiple
times to support this craziness, right now we support 4096 byte paths
([1]), with the longest known path being 1772, and others reported
even more.
And this is not a specific issue to my current employer, I've seen
this in customer installations when I was at SUN (including long
debates about Solaris's 1024 byte limit) and RedHat too.

[1]=Windows opened the next can of pandora with removing the MAXPATH
limit a while ago, e.g. see
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
- and even before that there was the "\\?\" prefix.

[snip]
> > ca_maxoperations:
> >     The maximum number of operations the replier will accept
> >     in a COMPOUND or CB_COMPOUND. For the backchannel, the
> >     server MUST NOT change the value the client offers. For
> >     the fore channel, the server MAY change the requested
> >     value. After the session is created, if a requester sends
> >     a COMPOUND or CB_COMPOUND with more operations than
> >     ca_maxoperations, the replier MUST return
> >     NFS4ERR_TOO_MANY_OPS.
>
> The BCP 14 "MAY" here means that servers can return the same
> value, but clients have to expect that a server might return
> something different.
>
> Further, the spec does not permit an NFS server to respond to
> a COMPOUND with more than the client's ca_maxoperations in
> any way other than to return NFS4ERR_TOO_MANY_OPS. So it
> cannot return a larger ca_maxoperations than the client sent.
>
> NFSD returns the minimum of the client's max-ops and its own
> NFSD_MAX_OPS_PER_COMPOUND value, which is 50. Thus NFSD will
> return the same value as the client, unless the client asks
> for more than 50.

I finally (yay - Saturday) had a look at this issue and
collected&&processed statistics.
With a Linux 6.6.20-rt25 kernel nfsd I get this in the ms-nfs41-client:
---- snip ----
1010: requested: req.csa_fore_chan_attrs.(ca_maxoperations=16384,
ca_maxrequests=128)
1010: response:  session->fore_chan_attrs->(ca_maxoperations=50,
ca_maxrequests=66)
---- snip ----

So - if I understand it correctly - the negotiation works correctly,
and we get |ca_maxoperations=50| and |ca_maxrequests=66|.

But... this value is too small, at least for what we do on Windows.
I've collected samples (84 machines, a wide range of users, MS Office,
ERP, CAD, etc.) and 71% of all server lookup calls had to be split
(Linux 6.6 LTS kernel nfsd) for |ca_maxoperations==50|, 39% for
|ca_maxoperations==64| and <1% for |ca_maxoperations==80|.

Question is... should the values for |ca_*| be a tuneable, or just
increase the limit to |80| ([1]) ?

[1]=I can provide the patch, with sufficient curses about Windows
*USERS* included...

----

Bye,
Roland
-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz@xxxxxxxxxxx
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)

|ca_maxoperations| - tuneable ? / was: Re: RFE: Linux nfsd's |ca_maxoperations| should be at *least* |64| ... / was: Re: kernel.org list issues... / was: Fwd: Turn NFSD_MAX_* into tuneables ? / was: Re: Increasing NFSD_MAX_OPS_PER_COMPOUND to 96

|ca_maxoperations| - tuneable ? / was: Re: RFE: Linux nfsd's |ca_maxoperations| should be at least |64| ... / was: Re: kernel.org list issues... / was: Fwd: Turn NFSD_MAX_* into tuneables ? / was: Re: Increasing NFSD_MAX_OPS_PER_COMPOUND to 96