On Tue, Jul 02, 2024 at 11:07:13AM +0300, Sagi Grimberg wrote: > On 01/07/2024 8:26, Christoph Hellwig wrote: >> When NFS requests are split into sub-requests, nfs_inode_remove_request >> calls nfs_page_group_sync_on_bit to set PG_REMOVE on this sub-request and >> only completes the head requests once PG_REMOVE is set on all requests. >> This means that when nfs_lock_and_join_requests sees a PG_REMOVE bit, I/O >> on the request is in progress and has partially completed. If such a >> request is returned to nfs_try_to_update_request, it could be extended >> with the newly dirtied region and I/O for the combined range will be >> re-scheduled, leading to extra I/O. > > Probably worth noting in the change log that large folios makes this > potentially much > worse? That assumes large folios actually create more subrequest. One big reason to create subrequests is flexfiles mirroring, which of course doesn't change with large folio. The other is that if ->pg_test doesn't allow the nfs_page to cover everything, which is roughly bound by a page array allocation and for PNFS the layout segment size, and the chance for that to fail could very slightly increase.