[ Adding NFSD reviewers ... ] On 2/20/25 12:12 PM, Mike Snitzer wrote: > Add nfsd 'nfsd_dontcache' modparam so that "Any data read or written > by nfsd will be removed from the page cache upon completion." > > nfsd_dontcache is disabled by default. It may be enabled with: > echo Y > /sys/module/nfsd/parameters/nfsd_dontcache A per-export setting like an export option would be nicer. Also, does it make sense to make it a separate control for READ and one for WRITE? My trick knee suggests caching read results is still going to add significant value, but write, not so much. However, to add any such administrative control, I'd like to see some performance numbers. I think we need to enumerate the cases (I/O types) that are most interesting to examine: small memory NFS servers; lots of small unaligned I/O; server-side CPU per byte; storage interrupt rates; any others? And let's see some user/admin documentation (eg when should this setting be enabled? when would it be contra-indicated?) The same arguments that applied to Cedric's request to make maximum RPC size a tunable setting apply here. Do we want to carry a manual setting for this mechanism for a long time, or do we expect that the setting can become automatic/uninteresting after a period of experimentation? * It might be argued that putting these experimental tunables under /sys eliminates the support longevity question, since there aren't strict rules about removing files under /sys. > FOP_DONTCACHE must be advertised as supported by the underlying > filesystem (e.g. XFS), otherwise if/when 'nfsd_dontcache' is enabled > all IO will fail with -EOPNOTSUPP. It would be better all around if NFSD simply ignored the setting in the cases where the underlying file system doesn't implement DONTCACHE. > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> > --- > fs/nfsd/vfs.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 29cb7b812d71..d7e49004e93d 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -955,6 +955,11 @@ nfsd_open_verified(struct svc_fh *fhp, int may_flags, struct file **filp) > return __nfsd_open(fhp, S_IFREG, may_flags, filp); > } > > +static bool nfsd_dontcache __read_mostly = false; > +module_param(nfsd_dontcache, bool, 0644); > +MODULE_PARM_DESC(nfsd_dontcache, > + "Any data read or written by nfsd will be removed from the page cache upon completion."); > + > /* > * Grab and keep cached pages associated with a file in the svc_rqst > * so that they can be passed to the network sendmsg routines > @@ -1084,6 +1089,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > loff_t ppos = offset; > struct page *page; > ssize_t host_err; > + rwf_t flags = 0; > > v = 0; > total = *count; > @@ -1097,9 +1103,12 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > } > WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec)); > > + if (nfsd_dontcache) > + flags |= RWF_DONTCACHE; > + > trace_nfsd_read_vector(rqstp, fhp, offset, *count); > iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count); > - host_err = vfs_iter_read(file, &iter, &ppos, 0); > + host_err = vfs_iter_read(file, &iter, &ppos, flags); > return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); > } > > @@ -1186,6 +1195,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, > if (stable && !fhp->fh_use_wgather) > flags |= RWF_SYNC; > > + if (nfsd_dontcache) > + flags |= RWF_DONTCACHE; > + > iov_iter_kvec(&iter, ITER_SOURCE, vec, vlen, *cnt); > since = READ_ONCE(file->f_wb_err); > if (verf) > @@ -1237,6 +1249,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, > */ > bool nfsd_read_splice_ok(struct svc_rqst *rqstp) > { > + if (nfsd_dontcache) /* force the use of vfs_iter_read for reads */ > + return false; > + Urgh. So I've been mulling over simply removing the splice read path. - Less code, less complexity, smaller test matrix - How much of a performance loss would result? - Would such a change make it easier to pass whole folios from the file system directly to the network layer? > switch (svc_auth_flavor(rqstp)) { > case RPC_AUTH_GSS_KRB5I: > case RPC_AUTH_GSS_KRB5P: -- Chuck Lever