On Mon, 2021-06-21 at 15:06 -0400, Trond Myklebust wrote: > On Mon, 2021-06-21 at 18:27 +0000, Chuck Lever III wrote: > > > > > > > On Jun 21, 2021, at 2:00 PM, Trond Myklebust > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > On Mon, 2021-06-21 at 16:49 +0000, Chuck Lever III wrote: > > > > Hi- > > > > > > > > > On Jun 17, 2021, at 7:26 PM, trondmy@xxxxxxxxxx wrote: > > > > > > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > > > > > When flushing out the unstable file writes as part of a > > > > > COMMIT > > > > > call, try > > > > > to perform most of of the data writes and waits outside the > > > > > semaphore. > > > > > > > > > > This means that if the client is sending the COMMIT as part > > > > > of > > > > > a > > > > > memory > > > > > reclaim operation, then it can continue performing I/O, with > > > > > contention > > > > > for the lock occurring only once the data sync is finished. > > > > > > > > > > Fixes: 5011af4c698a ("nfsd: Fix stable writes") > > > > > Signed-off-by: Trond Myklebust > > > > > <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > > > The good news is I've found no functional regressions. The bad > > > > news is I haven't seen any difference in performance. Is there > > > > a particular test that I can run to observe improvement? > > > > > > I'd expect that re-exported NFS would be the best test since > > > fsync() is > > > a high latency operation when the page cache is loaded. You also > > > want > > > to use a client with relatively limited memory so that it will > > > try > > > to > > > reclaim memory by pushing out dirty pages and doing COMMIT. > > > > I thought I was hitting those low-memory cases with direct I/O > > testing. <shrug> > > If you do O_DIRECT with 100MB write() calls, then maybe. If you use < > 1MB write()s then you'll rarely if ever hit any COMMIT calls. Let me correct myself: You'd need to do aio/dio with 100MB write system calls so that the flush part of the COMMIT operation takes significant time on the server, and can interfere with other COMMIT operations. > > > > > > > > > I wonder about adding a Fixes: tag for a change that the patch > > > > description describes as an optimization. > > > > > > I've occasionally hit OOM situations in the re-export case when > > > the > > > r/w > > > lock contention causes softerr failure to be serialised. > > > i.e. if the server is down, and you're essentially hoping that > > > the > > > nfsd > > > threads will give up and return EJUKEBOX/NFS4ERR_DELAY to the > > > client, > > > then that lock ensures that threads fail one-by-one (grab lock, > > > write, > > > timeout, release lock) instead of being able to all fail at once > > > (write, timeout). > > > > > > > > > > > > > > > > --- > > > > > fs/nfsd/vfs.c | 18 ++++++++++++++++-- > > > > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > > > > index 15adf1f6ab21..46485c04740d 100644 > > > > > --- a/fs/nfsd/vfs.c > > > > > +++ b/fs/nfsd/vfs.c > > > > > @@ -1123,6 +1123,19 @@ nfsd_write(struct svc_rqst *rqstp, > > > > > struct > > > > > svc_fh *fhp, loff_t offset, > > > > > } > > > > > > > > > > #ifdef CONFIG_NFSD_V3 > > > > > +static int > > > > > +nfsd_filemap_write_and_wait_range(struct nfsd_file *nf, > > > > > loff_t > > > > > offset, > > > > > + loff_t end) > > > > > +{ > > > > > + struct address_space *mapping = nf->nf_file- > > > > > >f_mapping; > > > > > + int ret = filemap_fdatawrite_range(mapping, offset, > > > > > end); > > > > > + > > > > > + if (ret) > > > > > + return ret; > > > > > + filemap_fdatawait_range_keep_errors(mapping, offset, > > > > > end); > > > > > + return 0; > > > > > +} > > > > > + > > > > > /* > > > > > * Commit all pending writes to stable storage. > > > > > * > > > > > @@ -1153,10 +1166,11 @@ nfsd_commit(struct svc_rqst *rqstp, > > > > > struct > > > > > svc_fh *fhp, > > > > > if (err) > > > > > goto out; > > > > > if (EX_ISSYNC(fhp->fh_export)) { > > > > > - int err2; > > > > > + int err2 = > > > > > nfsd_filemap_write_and_wait_range(nf, > > > > > offset, end); > > > > > > > > > > down_write(&nf->nf_rwsem); > > > > > - err2 = vfs_fsync_range(nf->nf_file, offset, > > > > > end, > > > > > 0); > > > > > + if (!err2) > > > > > + err2 = vfs_fsync_range(nf->nf_file, > > > > > offset, > > > > > end, 0); > > > > > switch (err2) { > > > > > case 0: > > > > > nfsd_copy_boot_verifier(verf, > > > > > net_generic(nf->nf_net, > > > > > -- > > > > > 2.31.1 > > > > > > > > > > > > > -- > > > > Chuck Lever > > > > > > > > > > > > > > > > > > -- > > > Trond Myklebust > > > Linux NFS client maintainer, Hammerspace > > > trond.myklebust@xxxxxxxxxxxxxxx > > > > -- > > Chuck Lever > > > > > > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx