On Thu, Jul 28, 2022 at 01:10:16PM +0200, Jan Kara wrote: > Hi Christoph! > > On Tue 19-07-22 06:13:07, Christoph Hellwig wrote: > > this series removes iomap_writepage and it's callers, following what xfs > > has been doing for a long time. > > So this effectively means "no writeback from page reclaim for these > filesystems" AFAICT (page migration of dirty pages seems to be handled by > iomap_migrate_page()) which is going to make life somewhat harder for > memory reclaim when memory pressure is high enough that dirty pages are > reaching end of the LRU list. I don't expect this to be a problem on big > machines but it could have some undesirable effects for small ones > (embedded, small VMs). I agree per-page writeback has been a bad idea for > efficiency reasons for at least last 10-15 years and most filesystems > stopped dealing with more complex situations (like block allocation) from > ->writepage() already quite a few years ago without any bug reports AFAIK. > So it all seems like a sensible idea from FS POV but are MM people on board > or at least aware of this movement in the fs land? > > Added a few CC's for that. > There is some context missing because it's not clear what the full impact is but it is definitly the case that writepage is ignored in some contexts for common filesystems so lets assume that writepage from reclaim context always failed as a worst case scenario. Certainly this type of change is something linux-mm needs to be aware of because we've been blind-sided before. I don't think it would be incredibly damaging although there *might* be issues with small systems or cgroups. In many respects, vmscan has been moving in this direction for a long time e.g. f84f6e2b0868 ("mm: vmscan: do not writeback filesystem pages in kswapd except in high priority") and e2be15f6c3ee ("mm: vmscan: stall page reclaim and writeback pages based on dirty/writepage pages encountered"). This was roughly 10 years ago when it was clear that FS writeback from reclaim context was fragile (iirc it was partially due to concerns about stack depth and later concerns that a filesystem would simply ignore the writeback request). There also is less reliance on stalling reclaim by queueing and waiting on writeback but we now should explicitly throttle if no progress is being made e.g. 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made") and some follow-up fixes. There still is a reliance on swap and shmem pages will not ignore writepage but I assume that is still ok. One potential caveat is if wakeup_flusher_threads() is ignored because there is a reliance in reclaim that if all the pages at the tail of the LRU are dirty pages not queued for writeback then wakeup_flusher_threads() will do something so pages get marked for immediate reclaim when writeback completes. Of course there is no guarantee that flusher threads will start writeback on the old pages and the pages could be backed by a slow BDI but wakeup_flusher_threads() should not be ignored. Another caveat is that comments become misleading. Take for example the comment "Only kswapd can writeback filesystem folios to avoid risk of stack overflow." The wording should change to note that writepage may do nothing at all. There also might need to be some adjustment on when pages get marked for immediate reclaim when they are dirty but not under writeback. pageout() might need a tracepoint for "mapping->a_ops->writepage == NULL" to help debug problems around a failure to queue pages for writback although that could be done as a test patch for a bug. There would need to be some changes made if writepage always or often failed and there might be some premature throttling based on "NOPROGRESS" on small systems due to dirty-not-writepage pages at the tail of the LRU but I don't think it would be an immediate disaster, Reclaim throttling is no longer based on the ability to queue for writeback or "congestion" state of BDIs and some care is taken to not prematurely stall on NOPROGRESS. However, if there was a bug related to premature stalls or excessive CPU usage from direct reclaimers or kswapd that bisected to a change in writepage then it should be fixed on the vmscan-side to put an emphasis on handling "reclaim is in trouble when the bulk of reclaimable pages are FS-only, dirty and not under writeback but ->writepage is a no-op". -- Mel Gorman SUSE Labs