Re: nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 21 Feb 2025 10:25:03 -0500

On Fri, 2025-02-21 at 10:02 -0500, Mike Snitzer wrote:
> On Thu, Feb 20, 2025 at 01:17:42PM -0500, Chuck Lever wrote:
> > [ Adding NFSD reviewers ... ]
> > 
> > On 2/20/25 12:12 PM, Mike Snitzer wrote:
> > > Add nfsd 'nfsd_dontcache' modparam so that "Any data read or written
> > > by nfsd will be removed from the page cache upon completion."
> > > 
> > > nfsd_dontcache is disabled by default.  It may be enabled with:
> > >   echo Y > /sys/module/nfsd/parameters/nfsd_dontcache
> > 
> > A per-export setting like an export option would be nicer. Also, does
> > it make sense to make it a separate control for READ and one for WRITE?
> > My trick knee suggests caching read results is still going to add
> > significant value, but write, not so much.
> 
> My intent was to make 6.14's DONTCACHE feature able to be tested in
> the context of nfsd in a no-frills way.  I realize adding the
> nfsd_dontcache knob skews toward too raw, lacks polish.  But I'm
> inclined to expose such course-grained opt-in knobs to encourage
> others' discovery (and answers to some of the questions you pose
> below).  I also hope to enlist all NFSD reviewers' help in
> categorizing/documenting where DONTCACHE helps/hurts. ;)
> 
> And I agree that ultimately per-export control is needed.  I'll take
> the time to implement that, hopeful to have something more suitable in
> time for LSF.
> 

Would it make more sense to hook DONTCACHE up to the IO_ADVISE
operation in RFC7862? IO_ADVISE4_NOREUSE sounds like it has similar
meaning? That would give the clients a way to do this on a per-open
basis.

> > However, to add any such administrative control, I'd like to see some
> > performance numbers. I think we need to enumerate the cases (I/O types)
> > that are most interesting to examine: small memory NFS servers; lots of
> > small unaligned I/O; server-side CPU per byte; storage interrupt rates;
> > any others?
> > 
> > And let's see some user/admin documentation (eg when should this setting
> > be enabled? when would it be contra-indicated?)
> > 
> > The same arguments that applied to Cedric's request to make maximum RPC
> > size a tunable setting apply here. Do we want to carry a manual setting
> > for this mechanism for a long time, or do we expect that the setting can
> > become automatic/uninteresting after a period of experimentation?
> > 
> > * It might be argued that putting these experimental tunables under /sys
> >   eliminates the support longevity question, since there aren't strict
> >   rules about removing files under /sys.

Isn't /sys covered by the same ABI guarantees? I know debugfs isn't,
but I'm not sure about /sys.

> 
> Right, I do think a sysfs knob (that defaults to disabled, requires
> user opt-in) is a pretty useful and benign means to expose
> experimental functionality.
> 
> And I agree with all you said needed above, I haven't had the time to
> focus on DONTCACHE since ~Decemeber, I just picked up my old patches
> from that time and decided to send the NFSD one since DONTCACHE has
> been merged for 6.14.
> 
>
>  
> > > FOP_DONTCACHE must be advertised as supported by the underlying
> > > filesystem (e.g. XFS), otherwise if/when 'nfsd_dontcache' is enabled
> > > all IO will fail with -EOPNOTSUPP.
> > 
> > It would be better all around if NFSD simply ignored the setting in the
> > cases where the underlying file system doesn't implement DONTCACHE.
> 
> I'll work on making it so.
> 
> > 
> > 
> > > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> > > ---
> > >  fs/nfsd/vfs.c | 17 ++++++++++++++++-
> > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index 29cb7b812d71..d7e49004e93d 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -955,6 +955,11 @@ nfsd_open_verified(struct svc_fh *fhp, int may_flags, struct file **filp)
> > >  	return __nfsd_open(fhp, S_IFREG, may_flags, filp);
> > >  }
> > >  
> > > +static bool nfsd_dontcache __read_mostly = false;
> > > +module_param(nfsd_dontcache, bool, 0644);
> > > +MODULE_PARM_DESC(nfsd_dontcache,
> > > +		 "Any data read or written by nfsd will be removed from the page cache upon completion.");
> > > +
> > >  /*
> > >   * Grab and keep cached pages associated with a file in the svc_rqst
> > >   * so that they can be passed to the network sendmsg routines
> > > @@ -1084,6 +1089,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >  	loff_t ppos = offset;
> > >  	struct page *page;
> > >  	ssize_t host_err;
> > > +	rwf_t flags = 0;
> > >  
> > >  	v = 0;
> > >  	total = *count;
> > > @@ -1097,9 +1103,12 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >  	}
> > >  	WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
> > >  
> > > +	if (nfsd_dontcache)
> > > +		flags |= RWF_DONTCACHE;
> > > +
> > >  	trace_nfsd_read_vector(rqstp, fhp, offset, *count);
> > >  	iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
> > > -	host_err = vfs_iter_read(file, &iter, &ppos, 0);
> > > +	host_err = vfs_iter_read(file, &iter, &ppos, flags);
> > >  	return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
> > >  }
> > >  
> > > @@ -1186,6 +1195,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
> > >  	if (stable && !fhp->fh_use_wgather)
> > >  		flags |= RWF_SYNC;
> > >  
> > > +	if (nfsd_dontcache)
> > > +		flags |= RWF_DONTCACHE;
> > > +
> > >  	iov_iter_kvec(&iter, ITER_SOURCE, vec, vlen, *cnt);
> > >  	since = READ_ONCE(file->f_wb_err);
> > >  	if (verf)
> > > @@ -1237,6 +1249,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
> > >   */
> > >  bool nfsd_read_splice_ok(struct svc_rqst *rqstp)
> > >  {
> > > +	if (nfsd_dontcache) /* force the use of vfs_iter_read for reads */
> > > +		return false;
> > > +
> > 
> > Urgh.
> 
> Heh, yeah, bypassing splice was needed given dontcache hooks off vfs_iter_read.
> 
> > So I've been mulling over simply removing the splice read path.
> > 
> >  - Less code, less complexity, smaller test matrix
> > 
> >  - How much of a performance loss would result?
> > 
> >  - Would such a change make it easier to pass whole folios from
> >    the file system directly to the network layer?
> 
> Good to know.
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>