On Fri, 2024-05-24 at 18:14 +0200, Jan Kara wrote: > When we are doing WB_SYNC_ALL writeback, nfs submits write requests > with > NFS_FILE_SYNC flag to the server (which then generally treats it as > an > O_SYNC write). This helps to reduce latency for single requests but > when > submitting more requests, additional fsyncs on the server side hurt > latency. NFS generally avoids this additional overhead by not setting > NFS_FILE_SYNC if desc->pg_moreio is set. > > However this logic doesn't always work. When we do random 4k writes > to a huge > file and then call fsync(2), each page writeback is going to be sent > with > NFS_FILE_SYNC because after preparing one page for writeback, we > start writing > back next, nfs_do_writepage() will call nfs_pageio_cond_complete() > which finds > the page is not contiguous with previously prepared IO and submits is > *without* > setting desc->pg_moreio. Hence NFS_FILE_SYNC is used resulting in > poor > performance. > > Fix the problem by setting desc->pg_moreio in > nfs_pageio_cond_complete() before > submitting outstanding IO. This improves throughput of > fsync-after-random-writes on my test SSD from ~70MB/s to ~250MB/s. > > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > fs/nfs/pagelist.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c > index 6efb5068c116..040b6b79c75e 100644 > --- a/fs/nfs/pagelist.c > +++ b/fs/nfs/pagelist.c > @@ -1545,6 +1545,11 @@ void nfs_pageio_cond_complete(struct > nfs_pageio_descriptor *desc, pgoff_t index) > continue; > } else if (index == prev->wb_index + 1) > continue; > + /* > + * We will submit more requests after these. > Indicate > + * this to the underlying layers. > + */ > + desc->pg_moreio = 1; > nfs_pageio_complete(desc); > break; > } Thanks! -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx