Re: nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 21 Feb 2025 13:42:50 -0500

On Fri, 2025-02-21 at 16:13 +0000, Trond Myklebust wrote:
> On Fri, 2025-02-21 at 10:46 -0500, Chuck Lever wrote:
> > On 2/21/25 10:36 AM, Mike Snitzer wrote:
> > > On Fri, Feb 21, 2025 at 10:25:03AM -0500, Jeff Layton wrote:
> > > > On Fri, 2025-02-21 at 10:02 -0500, Mike Snitzer wrote:
> > > > > My intent was to make 6.14's DONTCACHE feature able to be
> > > > > tested in
> > > > > the context of nfsd in a no-frills way.  I realize adding the
> > > > > nfsd_dontcache knob skews toward too raw, lacks polish.  But
> > > > > I'm
> > > > > inclined to expose such course-grained opt-in knobs to
> > > > > encourage
> > > > > others' discovery (and answers to some of the questions you
> > > > > pose
> > > > > below).  I also hope to enlist all NFSD reviewers' help in
> > > > > categorizing/documenting where DONTCACHE helps/hurts. ;)
> > > > > 
> > > > > And I agree that ultimately per-export control is needed.  I'll
> > > > > take
> > > > > the time to implement that, hopeful to have something more
> > > > > suitable in
> > > > > time for LSF.
> > > > 
> > > > Would it make more sense to hook DONTCACHE up to the IO_ADVISE
> > > > operation in RFC7862? IO_ADVISE4_NOREUSE sounds like it has
> > > > similar
> > > > meaning? That would give the clients a way to do this on a per-
> > > > open
> > > > basis.
> > > 
> > > Just thinking aloud here but: Using a DONTCACHE scalpel on a per
> > > open
> > > basis quite likely wouldn't provide the required page reclaim
> > > relief
> > > if the server is being hammered with normal buffered IO.  Sure that
> > > particular DONTCACHE IO wouldn't contribute to the problem but it
> > > would still be impacted by those not opting to use DONTCACHE on
> > > entry
> > > to the server due to needing pages for its DONTCACHE buffered IO.
> > 
> > For this initial work, which is to provide a mechanism for
> > experimentation, IMO exposing the setting to clients won't be all
> > that helpful.
> > 
> > But there are some applications/workloads on clients where exposure
> > could be beneficial -- for instance, a backup job, where NFSD would
> > benefit by knowing it doesn't have to maintain the job's written data
> > in
> > its page cache. I regard that as a later evolutionary improvement,
> > though.
> > 
> > Jorge proposed adding the NFSv4.2 IO_ADVISE operation to NFSD, but I
> > think we first need to a) work out and document appropriate semantics
> > for each hint, because the spec does not provide specifics, and b)
> > perform some extensive benchmarking to understand their value and
> > impact.
> > 
> > 
> 
> That puts the onus on the application running on the client to decide
> the caching semantics of the server which:
>    A. Is a terrible idea™. The application may know how it wants to use
>       the cached data, and be able to somewhat confidently manage its
>       own pagecache. However in almost all cases, it will have no basis
>       for understanding how the server should manage its cache. The
>       latter really is a job for the sysadmin to figure out.
>    B. Is impractical, because even if you can figure out a policy, it
>       requires rewriting the application to manage the server cache.
>    C. Will require additional APIs on the NFSv4.2 client to expose the
>       IO_ADVISE operation. You cannot just map it to posix_fadvise()
>       and/or posix_madvise(), because IO_ADVISE is designed to manage a
>       completely different caching layer. At best, we might be able to
>       rally one or two more distributed filesystems to implement
>       similar functionality and share an API, however there is no
>       chance this API will be useful for ordinary filesystems.
> 

You could map this to RWF_DONTCACHE itself. I know that's really
intended as a hint to the local kernel, but it seems reasonable that if
the application is giving the kernel a DONTCACHE hint, we could pass
that along to the server as well. The server is under no obligation to
do anything with it, just like the kernel with RWF_DONTCACHE.

We could put an IO_ADVISE in a READ or READ_PLUS compound like so:

    PUTFH + IO_ADVISE(IO_ADVISE_NOREUSE for ranges being read) + READ_PLUS or READ ...

On the server, we could track those ranges in the compound and enable
RWF_DONTCACHE for any subsequent reads or writes.

All that said, I don't object to some sort of mechanism to turn this on
more globally, particularly since that would allow us to use this with
v3 I/O as well.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>