> On Jan 18, 2024, at 4:44 AM, Martin Wege <martin.l.wege@xxxxxxxxx> wrote: > > On Thu, Jan 18, 2024 at 2:57 AM Roland Mainz <roland.mainz@xxxxxxxxxxx> wrote: >> >> On Sat, Jan 13, 2024 at 5:10 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>>> On Jan 13, 2024, at 10:09 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>>> On Sat, 2024-01-13 at 15:47 +0100, Roland Mainz wrote: >>>>> On Sat, Jan 13, 2024 at 1:19 AM Dan Shelton <dan.f.shelton@xxxxxxxxx> wrote: >> [snip] >>>>> Is this the windows client? >>>> No, the ms-nfs41-client (see >>>> https://github.com/kofemann/ms-nfs41-client) uses a limit of |16|, but >>>> it is on our ToDo list to bump that to |128| (but honoring the limit >>>> set by the NFSv4.1 server during session negotiation) since it now >>>> supports very long paths ([1]) and this issue is a known performance >>>> bottleneck. >>> >>> A better way to optimize this case is to walk the path once >>> and cache the terminal component's file handle. This is what >>> Linux does, and it sounds like Dan's directory walker >>> optimizations do effectively the same thing. >> >> That assumes that no process does random access into deep subdirs. In >> that case the performance is absolutely terrible, unless you devote >> lots of memory to a giant cache (which is not feasible due to cache >> expiration limits, unless someone (please!) finally implements >> directory delegations). Do you mean not feasible for your client? Lookup caches have been part of operating systems for decades. Solaris, FreeBSD, and Linux all have one. Does the Windows kernel have one that mfs-nfs41-client can use? >> This also ignores the use case of WAN (wide-area-networks) and WLAN >> with the typical high latency and even higher amounts of network >> package loss&&retransmit, where the splitting of the requests comes >> with a HUGE latency penalty (you can reproduce this with network >> tools, just export a large tmpfs on the server, add a package delay of >> 400ms between client and server, use a path like >> "a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z/0/1/2/3/4/5/6/7/8/9", >> and compile gcc). The most frequently implemented solution to this problem is a lookup cache. Operating systems use it for local on-disk filesystems as well as for NFS. In the local filesystem case: Think about how long each path resolution would take if the operating system had to consult on-disk information for every component in the pathname. In the NFS case: The fastest round trip is no round trip. Keep a local cache and path resolution will be fast no matter what the network latency is. Note that the NFS server is going to use a lookup cache to make large path resolution COMPOUNDs go fast. It would be even faster (from the application's point of view) if that cache were local to the client. Sending a full path in a single COMPOUND is one way to handle path resolution, but it has so many limitations that it's really not the mechanism of choice. >> And in the real world the Linux nfsd |ca_maxoperations| default of >> |16| is absolutely CRIPPELING. >> For example in the mfs-nfs41-client we need 4 compounds for initial >> setup for a file lookup, and then 3 per path component. That means >> that a defaut of 16 just fits (16-4)/3=4 path elements. >> Unfortunately the statistical average is not 4 - it's 11 (measured >> over five weeks with 81 clients in our company). >> Technically, in this scenario, a default of at least 11*3+4=37 would >> be MUCH better. >> >> That's why I think nfsd's |ca_maxoperations| should be at *least* |64|. > > +1 > > I consider the default value of 16 even a bug, given the circumstances. This is not an NFSD bug. Read to the bottom to see where the real problem is. Here are the CREATE_SESSION arguments from a Linux client: csa_fore_chan_attrs hdr pad size: 0 max req size: 1049620 max resp size: 1049480 max resp size cached: 7584 max ops: 8 max reqs: 64 csa_back_chan_attrs hdr pad size: 0 max req size: 4096 max resp size: 4096 max resp size cached: 0 max ops: 2 max reqs: 16 The ca_maxoperations field contains 8. The response from NFSD looks like this: csr_fore_chan_attrs hdr pad size: 0 max req size: 1049620 max resp size: 1049480 max resp size cached: 2128 max ops: 8 max reqs: 30 csr_back_chan_attrs hdr pad size: 0 max req size: 4096 max resp size: 4096 max resp size cached: 0 max ops: 2 max reqs: 16 The ca_maxoperations field again contains 8. Here's what RFC 8881 Section 18.36.3 says: > ca_maxoperations: > The maximum number of operations the replier will accept > in a COMPOUND or CB_COMPOUND. For the backchannel, the > server MUST NOT change the value the client offers. For > the fore channel, the server MAY change the requested > value. After the session is created, if a requester sends > a COMPOUND or CB_COMPOUND with more operations than > ca_maxoperations, the replier MUST return > NFS4ERR_TOO_MANY_OPS. The BCP 14 "MAY" here means that servers can return the same value, but clients have to expect that a server might return something different. Further, the spec does not permit an NFS server to respond to a COMPOUND with more than the client's ca_maxoperations in any way other than to return NFS4ERR_TOO_MANY_OPS. So it cannot return a larger ca_maxoperations than the client sent. NFSD returns the minimum of the client's max-ops and its own NFSD_MAX_OPS_PER_COMPOUND value, which is 50. Thus NFSD will return the same value as the client, unless the client asks for more than 50. So, the only reason NFSD returns 16 to your client is because your client sets a value of 16 in its CREATE_SESSION Call. If your client sent a larger value (like, 11*3+4), then NFSD will respect that limit instead. The spec is very clear about how this needs to work, and NFSD is 100% compliant to the spec here. It's the client that has to request a larger limit. -- Chuck Lever