On Wed, 29 Jan 2025 at 16:02, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On 1/29/25 2:32 AM, Cedric Blancher wrote: > > On Wed, 22 Jan 2025 at 11:07, Cedric Blancher <cedric.blancher@xxxxxxxxx> wrote: > >> > >> Good morning! > >> > >> IMO it might be good to increase RPCSVC_MAXPAYLOAD to at least 8M, > >> giving the NFSv4.1 session mechanism some headroom for negotiation. > >> For over a decade the default value is 1M (1*1024*1024u), which IMO > >> causes problems with anything faster than 2500baseT. > > > > The 1MB limit was defined when 10base5/10baseT was the norm, and > > 100baseT (100mbit) was "fast". > > > > Nowadays 1000baseT is the norm, 2500baseT is in premium *laptops*, and > > 10000baseT is fast. > > Just the 1MB limit is now in the way of EVERYTHING, including "large > > send offload" and other acceleration features. > > > > So my suggestion is to increase the buffer to 4MB by default (2*2MB > > hugepages on x86), and allow a tuneable to select up to 16MB. > > TL;DR: This has been on the long-term to-do list for NFSD for quite some > time. > > We certainly want to support larger COMPOUNDs, but increasing > RPCSVC_MAXPAYLOAD is only the first step. > > The biggest obstacle is the rq_pages[] array in struct svc_rqst. Today > it has 259 entries. Quadrupling that would make the array itself > multiple pages in size, and there's one of these for each nfsd thread. > > We are working on replacing the use of page arrays with folios, which > would make this infrastructure significantly smaller and faster, but it > depends on folio support in all the kernel APIs that NFSD makes use of. > That situation continues to evolve. > > An equivalent issue exists in the Linux NFS client. > > Increasing this capability on the server without having a client that > can make use of it doesn't seem wise. > > You can try increasing the value of RPCSVC_MAXPAYLOAD yourself and try > some measurements to help make the case (and analyze the operational > costs). I think we need some confidence that increasing the maximum > payload size will not unduly impact small I/O. > > Re: a tunable: I'm not sure why someone would want to tune this number > down from the maximum. You can control how much total memory the server > consumes by reducing the number of nfsd threads. > I want a tuneable for TESTING, i.e. lower default (for now), but allow people to grab a stock Linux kernel, increase tunable, and do testing. Not everyone is happy with doing the voodoo of self-build testing, even more so in the (dark) "Age Of SecureBoot", where a signed kernel is mandatory. Therefore: Tuneable. Ced -- Cedric Blancher <cedric.blancher@xxxxxxxxx> [https://plus.google.com/u/0/+CedricBlancher/] Institute Pasteur