On Thu, Jul 28, 2022 at 3:48 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Thu, Jul 28, 2022 at 03:18:03PM +0100, Matthew Wilcox wrote: > > On Thu, Jul 28, 2022 at 01:10:16PM +0200, Jan Kara wrote: > > > Hi Christoph! > > > > > > On Tue 19-07-22 06:13:07, Christoph Hellwig wrote: > > > > this series removes iomap_writepage and it's callers, following what xfs > > > > has been doing for a long time. > > > > > > So this effectively means "no writeback from page reclaim for these > > > filesystems" AFAICT (page migration of dirty pages seems to be handled by > > > iomap_migrate_page()) which is going to make life somewhat harder for > > > memory reclaim when memory pressure is high enough that dirty pages are > > > reaching end of the LRU list. I don't expect this to be a problem on big > > > machines but it could have some undesirable effects for small ones > > > (embedded, small VMs). I agree per-page writeback has been a bad idea for > > > efficiency reasons for at least last 10-15 years and most filesystems > > > stopped dealing with more complex situations (like block allocation) from > > > ->writepage() already quite a few years ago without any bug reports AFAIK. > > > So it all seems like a sensible idea from FS POV but are MM people on board > > > or at least aware of this movement in the fs land? > > > > I mentioned it during my folio session at LSFMM, but didn't put a huge > > emphasis on it. > > > > For XFS, writeback should already be in progress on other pages if > > we're getting to the point of trying to call ->writepage() in vmscan. > > Surely this is also true for other filesystems? > > Yes. > > It's definitely true for btrfs, too, because btrfs_writepage does: > > static int btrfs_writepage(struct page *page, struct writeback_control *wbc) > { > struct inode *inode = page->mapping->host; > int ret; > > if (current->flags & PF_MEMALLOC) { > redirty_page_for_writepage(wbc, page); > unlock_page(page); > return 0; > } > .... > > It also rejects all calls to write dirty pages from memory reclaim > contexts. Aha, it seems even kswapd (it has PF_MEMALLOC set) is rejected too. > > ext4 will also reject writepage calls from memory allocation if > block allocation is required (due to delayed allocation) or > unwritten extents need converting to written. i.e. if it has to run > blocking transactions. > > So all three major filesystems will either partially or wholly > reject ->writepage calls from memory reclaim context. > > IOWs, if memory reclaim is depending on ->writepage() to make > reclaim progress, it's not working as advertised on the vast > majority of production Linux systems.... > > The reality is that ->writepage is a relic of a bygone era of OS and > filesystem design. It was useful in the days where writing a dirty > page just involved looking up the bufferhead attached to the page to > get the disk mapping and then submitting it for IO. > > Those days are long gone - filesystems have complex IO submission > paths now that have to handle delayed allocation, copy-on-write, > unwritten extents, have unbound memory demand, etc. All the > filesystems that support these 1990s era filesystem technologies > simply turn off ->writepage in memory reclaim contexts. > > Hence for the vast majority of linux users (i.e. everyone using > ext4, btrfs and XFS), ->writepage no longer plays any part in memory > reclaim on their systems. > > So why should we try to maintain the fiction that ->writepage is > required functionality in a filesystem when it clearly isn't? > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >