Re: Increase RPCSVC_MAXPAYLOAD to 8M?

Cedric Blancher <cedric.blancher@xxxxxxxxx> · Thu, 6 Feb 2025 09:45:10 +0100

On Wed, 29 Jan 2025 at 16:02, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
> On 1/29/25 2:32 AM, Cedric Blancher wrote:
> > On Wed, 22 Jan 2025 at 11:07, Cedric Blancher <cedric.blancher@xxxxxxxxx> wrote:
> >>
> >> Good morning!
> >>
> >> IMO it might be good to increase RPCSVC_MAXPAYLOAD to at least 8M,
> >> giving the NFSv4.1 session mechanism some headroom for negotiation.
> >> For over a decade the default value is 1M (1*1024*1024u), which IMO
> >> causes problems with anything faster than 2500baseT.
> >
> > The 1MB limit was defined when 10base5/10baseT was the norm, and
> > 100baseT (100mbit) was "fast".
> >
> > Nowadays 1000baseT is the norm, 2500baseT is in premium *laptops*, and
> > 10000baseT is fast.
> > Just the 1MB limit is now in the way of EVERYTHING, including "large
> > send offload" and other acceleration features.
> >
> > So my suggestion is to increase the buffer to 4MB by default (2*2MB
> > hugepages on x86), and allow a tuneable to select up to 16MB.
>
> TL;DR: This has been on the long-term to-do list for NFSD for quite some
> time.
>
> We certainly want to support larger COMPOUNDs, but increasing
> RPCSVC_MAXPAYLOAD is only the first step.
>
> The biggest obstacle is the rq_pages[] array in struct svc_rqst. Today
> it has 259 entries. Quadrupling that would make the array itself
> multiple pages in size, and there's one of these for each nfsd thread.
>
> We are working on replacing the use of page arrays with folios, which
> would make this infrastructure significantly smaller and faster, but it
> depends on folio support in all the kernel APIs that NFSD makes use of.
> That situation continues to evolve.
>
> An equivalent issue exists in the Linux NFS client.
>
> Increasing this capability on the server without having a client that
> can make use of it doesn't seem wise.
>
> You can try increasing the value of RPCSVC_MAXPAYLOAD yourself and try
> some measurements to help make the case (and analyze the operational
> costs). I think we need some confidence that increasing the maximum
> payload size will not unduly impact small I/O.
>
> Re: a tunable: I'm not sure why someone would want to tune this number
> down from the maximum. You can control how much total memory the server
> consumes by reducing the number of nfsd threads.
>

I want a tuneable for TESTING, i.e. lower default (for now), but allow
people to grab a stock Linux kernel, increase tunable, and do testing.
Not everyone is happy with doing the voodoo of self-build testing,
even more so in the (dark) "Age Of SecureBoot", where a signed kernel
is mandatory. Therefore: Tuneable.

Ced
-- 
Cedric Blancher <cedric.blancher@xxxxxxxxx>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur