On Tue, Jun 18, 2024 at 06:33:13PM +0300, Dan Aloni wrote: > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -1315,7 +1315,10 @@ static int nfs_can_extend_write(struct file *file, struct folio *folio, > struct file_lock_context *flctx = locks_inode_context(inode); > struct file_lock *fl; > int ret; > + unsigned int mntflags = NFS_SERVER(inode)->flags; > > + if (mntflags & NFS_MOUNT_NO_EXTEND) > + return 0; > if (file->f_flags & O_DSYNC) > return 0; > if (!nfs_folio_write_uptodate(folio, pagelen)) I find the logic in nfs_update_folio to extend the write to the entire folio rather weird, and especially bad with the larger folio support I just added. It makes the client write more (and with large page sizes or large folios) potentially a lot more than what the application asked for. The comment above nfs_can_extend_write suggest it is done to avoid "fragmentation". My immediate reaction assumed that would be about file system fragmentation, which seems odd given that I'd expect servers to either log data, in which case this just increases write amplification for no good reason, or use something like the Linux page cache in which case it would be entirely pointless. But when following git blame over a few rounds of fixes (that all narrow down the scope of this optimization because it caused problems) the "fragmentation" eventually becomes: /* If we're not using byte range locks, and we know the page * is entirely in cache, it may be more efficient to avoid * fragmenting write requests. */ Which to me suggests it is about struct nfs_page and the on-the-wire RPCs. In which case the merging in nfs_try_to_update_request that merges consecutive I/O should take care of all the interesting cases. In other words: I strongly suspect everyone is better off if this extending write behavior is removed or at least not the default.