Re: [PATCH] nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[ Adding NFSD reviewers ... ]

On 2/20/25 12:12 PM, Mike Snitzer wrote:
> Add nfsd 'nfsd_dontcache' modparam so that "Any data read or written
> by nfsd will be removed from the page cache upon completion."
> 
> nfsd_dontcache is disabled by default.  It may be enabled with:
>   echo Y > /sys/module/nfsd/parameters/nfsd_dontcache

A per-export setting like an export option would be nicer. Also, does
it make sense to make it a separate control for READ and one for WRITE?
My trick knee suggests caching read results is still going to add
significant value, but write, not so much.

However, to add any such administrative control, I'd like to see some
performance numbers. I think we need to enumerate the cases (I/O types)
that are most interesting to examine: small memory NFS servers; lots of
small unaligned I/O; server-side CPU per byte; storage interrupt rates;
any others?

And let's see some user/admin documentation (eg when should this setting
be enabled? when would it be contra-indicated?)

The same arguments that applied to Cedric's request to make maximum RPC
size a tunable setting apply here. Do we want to carry a manual setting
for this mechanism for a long time, or do we expect that the setting can
become automatic/uninteresting after a period of experimentation?

* It might be argued that putting these experimental tunables under /sys
  eliminates the support longevity question, since there aren't strict
  rules about removing files under /sys.


> FOP_DONTCACHE must be advertised as supported by the underlying
> filesystem (e.g. XFS), otherwise if/when 'nfsd_dontcache' is enabled
> all IO will fail with -EOPNOTSUPP.

It would be better all around if NFSD simply ignored the setting in the
cases where the underlying file system doesn't implement DONTCACHE.


> Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> ---
>  fs/nfsd/vfs.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 29cb7b812d71..d7e49004e93d 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -955,6 +955,11 @@ nfsd_open_verified(struct svc_fh *fhp, int may_flags, struct file **filp)
>  	return __nfsd_open(fhp, S_IFREG, may_flags, filp);
>  }
>  
> +static bool nfsd_dontcache __read_mostly = false;
> +module_param(nfsd_dontcache, bool, 0644);
> +MODULE_PARM_DESC(nfsd_dontcache,
> +		 "Any data read or written by nfsd will be removed from the page cache upon completion.");
> +
>  /*
>   * Grab and keep cached pages associated with a file in the svc_rqst
>   * so that they can be passed to the network sendmsg routines
> @@ -1084,6 +1089,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	loff_t ppos = offset;
>  	struct page *page;
>  	ssize_t host_err;
> +	rwf_t flags = 0;
>  
>  	v = 0;
>  	total = *count;
> @@ -1097,9 +1103,12 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	}
>  	WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
>  
> +	if (nfsd_dontcache)
> +		flags |= RWF_DONTCACHE;
> +
>  	trace_nfsd_read_vector(rqstp, fhp, offset, *count);
>  	iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
> -	host_err = vfs_iter_read(file, &iter, &ppos, 0);
> +	host_err = vfs_iter_read(file, &iter, &ppos, flags);
>  	return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
>  }
>  
> @@ -1186,6 +1195,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
>  	if (stable && !fhp->fh_use_wgather)
>  		flags |= RWF_SYNC;
>  
> +	if (nfsd_dontcache)
> +		flags |= RWF_DONTCACHE;
> +
>  	iov_iter_kvec(&iter, ITER_SOURCE, vec, vlen, *cnt);
>  	since = READ_ONCE(file->f_wb_err);
>  	if (verf)
> @@ -1237,6 +1249,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
>   */
>  bool nfsd_read_splice_ok(struct svc_rqst *rqstp)
>  {
> +	if (nfsd_dontcache) /* force the use of vfs_iter_read for reads */
> +		return false;
> +

Urgh.

So I've been mulling over simply removing the splice read path.

 - Less code, less complexity, smaller test matrix

 - How much of a performance loss would result?

 - Would such a change make it easier to pass whole folios from
   the file system directly to the network layer?


>  	switch (svc_auth_flavor(rqstp)) {
>  	case RPC_AUTH_GSS_KRB5I:
>  	case RPC_AUTH_GSS_KRB5P:


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux