On Fri, 2025-02-21 at 16:13 +0000, Trond Myklebust wrote: > On Fri, 2025-02-21 at 10:46 -0500, Chuck Lever wrote: > > On 2/21/25 10:36 AM, Mike Snitzer wrote: > > > On Fri, Feb 21, 2025 at 10:25:03AM -0500, Jeff Layton wrote: > > > > On Fri, 2025-02-21 at 10:02 -0500, Mike Snitzer wrote: > > > > > My intent was to make 6.14's DONTCACHE feature able to be > > > > > tested in > > > > > the context of nfsd in a no-frills way. I realize adding the > > > > > nfsd_dontcache knob skews toward too raw, lacks polish. But > > > > > I'm > > > > > inclined to expose such course-grained opt-in knobs to > > > > > encourage > > > > > others' discovery (and answers to some of the questions you > > > > > pose > > > > > below). I also hope to enlist all NFSD reviewers' help in > > > > > categorizing/documenting where DONTCACHE helps/hurts. ;) > > > > > > > > > > And I agree that ultimately per-export control is needed. I'll > > > > > take > > > > > the time to implement that, hopeful to have something more > > > > > suitable in > > > > > time for LSF. > > > > > > > > Would it make more sense to hook DONTCACHE up to the IO_ADVISE > > > > operation in RFC7862? IO_ADVISE4_NOREUSE sounds like it has > > > > similar > > > > meaning? That would give the clients a way to do this on a per- > > > > open > > > > basis. > > > > > > Just thinking aloud here but: Using a DONTCACHE scalpel on a per > > > open > > > basis quite likely wouldn't provide the required page reclaim > > > relief > > > if the server is being hammered with normal buffered IO. Sure that > > > particular DONTCACHE IO wouldn't contribute to the problem but it > > > would still be impacted by those not opting to use DONTCACHE on > > > entry > > > to the server due to needing pages for its DONTCACHE buffered IO. > > > > For this initial work, which is to provide a mechanism for > > experimentation, IMO exposing the setting to clients won't be all > > that helpful. > > > > But there are some applications/workloads on clients where exposure > > could be beneficial -- for instance, a backup job, where NFSD would > > benefit by knowing it doesn't have to maintain the job's written data > > in > > its page cache. I regard that as a later evolutionary improvement, > > though. > > > > Jorge proposed adding the NFSv4.2 IO_ADVISE operation to NFSD, but I > > think we first need to a) work out and document appropriate semantics > > for each hint, because the spec does not provide specifics, and b) > > perform some extensive benchmarking to understand their value and > > impact. > > > > > > That puts the onus on the application running on the client to decide > the caching semantics of the server which: > A. Is a terrible idea™. The application may know how it wants to use > the cached data, and be able to somewhat confidently manage its > own pagecache. However in almost all cases, it will have no basis > for understanding how the server should manage its cache. The > latter really is a job for the sysadmin to figure out. > B. Is impractical, because even if you can figure out a policy, it > requires rewriting the application to manage the server cache. > C. Will require additional APIs on the NFSv4.2 client to expose the > IO_ADVISE operation. You cannot just map it to posix_fadvise() > and/or posix_madvise(), because IO_ADVISE is designed to manage a > completely different caching layer. At best, we might be able to > rally one or two more distributed filesystems to implement > similar functionality and share an API, however there is no > chance this API will be useful for ordinary filesystems. > You could map this to RWF_DONTCACHE itself. I know that's really intended as a hint to the local kernel, but it seems reasonable that if the application is giving the kernel a DONTCACHE hint, we could pass that along to the server as well. The server is under no obligation to do anything with it, just like the kernel with RWF_DONTCACHE. We could put an IO_ADVISE in a READ or READ_PLUS compound like so: PUTFH + IO_ADVISE(IO_ADVISE_NOREUSE for ranges being read) + READ_PLUS or READ ... On the server, we could track those ranges in the compound and enable RWF_DONTCACHE for any subsequent reads or writes. All that said, I don't object to some sort of mechanism to turn this on more globally, particularly since that would allow us to use this with v3 I/O as well. -- Jeff Layton <jlayton@xxxxxxxxxx>