On Fri, 2024-05-24 at 18:14 +0200, Jan Kara wrote: > When we are doing WB_SYNC_ALL writeback, nfs submits write requests with > NFS_FILE_SYNC flag to the server (which then generally treats it as an > O_SYNC write). This helps to reduce latency for single requests but when > submitting more requests, additional fsyncs on the server side hurt > latency. NFS generally avoids this additional overhead by not setting > NFS_FILE_SYNC if desc->pg_moreio is set. > > However this logic doesn't always work. When we do random 4k writes to a huge > file and then call fsync(2), each page writeback is going to be sent with > NFS_FILE_SYNC because after preparing one page for writeback, we start writing > back next, nfs_do_writepage() will call nfs_pageio_cond_complete() which finds > the page is not contiguous with previously prepared IO and submits is *without* > setting desc->pg_moreio. Hence NFS_FILE_SYNC is used resulting in poor > performance. > > Fix the problem by setting desc->pg_moreio in nfs_pageio_cond_complete() before > submitting outstanding IO. This improves throughput of > fsync-after-random-writes on my test SSD from ~70MB/s to ~250MB/s. > > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > fs/nfs/pagelist.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c > index 6efb5068c116..040b6b79c75e 100644 > --- a/fs/nfs/pagelist.c > +++ b/fs/nfs/pagelist.c > @@ -1545,6 +1545,11 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index) > continue; > } else if (index == prev->wb_index + 1) > continue; > + /* > + * We will submit more requests after these. Indicate > + * this to the underlying layers. > + */ > + desc->pg_moreio = 1; > nfs_pageio_complete(desc); > break; > } Nice work! Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>