On Fri, 2025-02-21 at 10:36 -0500, Mike Snitzer wrote: > On Fri, Feb 21, 2025 at 10:25:03AM -0500, Jeff Layton wrote: > > On Fri, 2025-02-21 at 10:02 -0500, Mike Snitzer wrote: > > > On Thu, Feb 20, 2025 at 01:17:42PM -0500, Chuck Lever wrote: > > > > [ Adding NFSD reviewers ... ] > > > > > > > > On 2/20/25 12:12 PM, Mike Snitzer wrote: > > > > > Add nfsd 'nfsd_dontcache' modparam so that "Any data read or written > > > > > by nfsd will be removed from the page cache upon completion." > > > > > > > > > > nfsd_dontcache is disabled by default. It may be enabled with: > > > > > echo Y > /sys/module/nfsd/parameters/nfsd_dontcache > > > > > > > > A per-export setting like an export option would be nicer. Also, does > > > > it make sense to make it a separate control for READ and one for WRITE? > > > > My trick knee suggests caching read results is still going to add > > > > significant value, but write, not so much. > > > > > > My intent was to make 6.14's DONTCACHE feature able to be tested in > > > the context of nfsd in a no-frills way. I realize adding the > > > nfsd_dontcache knob skews toward too raw, lacks polish. But I'm > > > inclined to expose such course-grained opt-in knobs to encourage > > > others' discovery (and answers to some of the questions you pose > > > below). I also hope to enlist all NFSD reviewers' help in > > > categorizing/documenting where DONTCACHE helps/hurts. ;) > > > > > > And I agree that ultimately per-export control is needed. I'll take > > > the time to implement that, hopeful to have something more suitable in > > > time for LSF. > > > > > > > Would it make more sense to hook DONTCACHE up to the IO_ADVISE > > operation in RFC7862? IO_ADVISE4_NOREUSE sounds like it has similar > > meaning? That would give the clients a way to do this on a per-open > > basis. > > Just thinking aloud here but: Using a DONTCACHE scalpel on a per open > basis quite likely wouldn't provide the required page reclaim relief > if the server is being hammered with normal buffered IO. Sure that > particular DONTCACHE IO wouldn't contribute to the problem but it > would still be impacted by those not opting to use DONTCACHE on entry > to the server due to needing pages for its DONTCACHE buffered IO. > Actually, now that I read the spec, it looks like you could just embed an IO_ADVISE operation in the read compound: PUTFH + IO_ADVISE(for the range that you're reading) + READ() operation That said, that does nothing for v3 reads, which I imagine you're interested in hooking up here too. > > > > However, to add any such administrative control, I'd like to see some > > > > performance numbers. I think we need to enumerate the cases (I/O types) > > > > that are most interesting to examine: small memory NFS servers; lots of > > > > small unaligned I/O; server-side CPU per byte; storage interrupt rates; > > > > any others? > > > > > > > > And let's see some user/admin documentation (eg when should this setting > > > > be enabled? when would it be contra-indicated?) > > > > > > > > The same arguments that applied to Cedric's request to make maximum RPC > > > > size a tunable setting apply here. Do we want to carry a manual setting > > > > for this mechanism for a long time, or do we expect that the setting can > > > > become automatic/uninteresting after a period of experimentation? > > > > > > > > * It might be argued that putting these experimental tunables under /sys > > > > eliminates the support longevity question, since there aren't strict > > > > rules about removing files under /sys. > > > > Isn't /sys covered by the same ABI guarantees? I know debugfs isn't, > > but I'm not sure about /sys. > > Only if you add them to the ABI docs as supported (at least that is my > experience relative to various block limits knobs, etc). But yeah, > invariably that invites a cat and mouse game of users using the knob > and then complaining loudly if/when it goes away. > > Mike -- Jeff Layton <jlayton@xxxxxxxxxx>