> On Jun 17, 2021, at 7:26 PM, trondmy@xxxxxxxxxx wrote: > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > When flushing out the unstable file writes as part of a COMMIT call, try > to perform most of of the data writes and waits outside the semaphore. > > This means that if the client is sending the COMMIT as part of a memory > reclaim operation, then it can continue performing I/O, with contention > for the lock occurring only once the data sync is finished. > > Fixes: 5011af4c698a ("nfsd: Fix stable writes") > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> I can see write throughput improvement now. Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx> This is NFSv3 against an NVMe-backed XFS export, wsize=8192: v5.13-rc6: Command line used: /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r256k -t12 -I Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 12 processes Each process writes a 1048576 kByte file in 256 kByte records Children see throughput for 12 initial writers = 416004.59 kB/sec Parent sees throughput for 12 initial writers = 415691.65 kB/sec Min throughput per process = 34630.36 kB/sec Max throughput per process = 34703.62 kB/sec Avg throughput per process = 34667.05 kB/sec Min xfer = 1046528.00 kB CPU Utilization: Wall time 30.239 CPU time 5.854 CPU utilization 19.36 % Children see throughput for 12 rewriters = 516605.59 kB/sec Parent sees throughput for 12 rewriters = 516530.05 kB/sec Min throughput per process = 43007.56 kB/sec Max throughput per process = 43074.50 kB/sec Avg throughput per process = 43050.47 kB/sec Min xfer = 1047040.00 kB CPU utilization: Wall time 24.347 CPU time 5.882 CPU utilization 24.16 % v5.13-rc6 + Trond's patch: Command line used: /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r256k -t12 -I Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 12 processes Each process writes a 1048576 kByte file in 256 kByte records Children see throughput for 12 initial writers = 434971.09 kB/sec Parent sees throughput for 12 initial writers = 434649.13 kB/sec Min throughput per process = 36209.41 kB/sec Max throughput per process = 36287.55 kB/sec Avg throughput per process = 36247.59 kB/sec Min xfer = 1046528.00 kB CPU Utilization: Wall time 28.920 CPU time 5.705 CPU utilization 19.73 % Children see throughput for 12 rewriters = 544700.37 kB/sec Parent sees throughput for 12 rewriters = 544623.91 kB/sec Min throughput per process = 45320.82 kB/sec Max throughput per process = 45456.07 kB/sec Avg throughput per process = 45391.70 kB/sec Min xfer = 1045504.00 kB CPU utilization: Wall time 23.071 CPU time 5.708 CPU utilization 24.74 % > --- > fs/nfsd/vfs.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 15adf1f6ab21..46485c04740d 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1123,6 +1123,19 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, > } > > #ifdef CONFIG_NFSD_V3 > +static int > +nfsd_filemap_write_and_wait_range(struct nfsd_file *nf, loff_t offset, > + loff_t end) > +{ > + struct address_space *mapping = nf->nf_file->f_mapping; > + int ret = filemap_fdatawrite_range(mapping, offset, end); > + > + if (ret) > + return ret; > + filemap_fdatawait_range_keep_errors(mapping, offset, end); > + return 0; > +} > + > /* > * Commit all pending writes to stable storage. > * > @@ -1153,10 +1166,11 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, > if (err) > goto out; > if (EX_ISSYNC(fhp->fh_export)) { > - int err2; > + int err2 = nfsd_filemap_write_and_wait_range(nf, offset, end); > > down_write(&nf->nf_rwsem); > - err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); > + if (!err2) > + err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); > switch (err2) { > case 0: > nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net, > -- > 2.31.1 > -- Chuck Lever