Re: remove iomap_writepage v2

Mel Gorman <mgorman@xxxxxxx> · Fri, 29 Jul 2022 10:22:16 +0100

On Thu, Jul 28, 2022 at 01:10:16PM +0200, Jan Kara wrote:
> Hi Christoph!
> 
> On Tue 19-07-22 06:13:07, Christoph Hellwig wrote:
> > this series removes iomap_writepage and it's callers, following what xfs
> > has been doing for a long time.
> 
> So this effectively means "no writeback from page reclaim for these
> filesystems" AFAICT (page migration of dirty pages seems to be handled by
> iomap_migrate_page()) which is going to make life somewhat harder for
> memory reclaim when memory pressure is high enough that dirty pages are
> reaching end of the LRU list. I don't expect this to be a problem on big
> machines but it could have some undesirable effects for small ones
> (embedded, small VMs). I agree per-page writeback has been a bad idea for
> efficiency reasons for at least last 10-15 years and most filesystems
> stopped dealing with more complex situations (like block allocation) from
> ->writepage() already quite a few years ago without any bug reports AFAIK.
> So it all seems like a sensible idea from FS POV but are MM people on board
> or at least aware of this movement in the fs land?
> 
> Added a few CC's for that.
> 

There is some context missing because it's not clear what the full impact is
but it is definitly the case that writepage is ignored in some contexts for
common filesystems so lets assume that writepage from reclaim context always
failed as a worst case scenario. Certainly this type of change is something
linux-mm needs to be aware of because we've been blind-sided before.

I don't think it would be incredibly damaging although there *might* be
issues with small systems or cgroups. In many respects, vmscan has been
moving in this direction for a long time e.g. f84f6e2b0868 ("mm: vmscan:
do not writeback filesystem pages in kswapd except in high priority") and
e2be15f6c3ee ("mm: vmscan: stall page reclaim and writeback pages based on
dirty/writepage pages encountered"). This was roughly 10 years ago when
it was clear that FS writeback from reclaim context was fragile (iirc it
was partially due to concerns about stack depth and later concerns that a
filesystem would simply ignore the writeback request). There also is less
reliance on stalling reclaim by queueing and waiting on writeback but we now
should explicitly throttle if no progress is being made e.g. 69392a403f49
("mm/vmscan: throttle reclaim when no progress is being made") and some
follow-up fixes.

There still is a reliance on swap and shmem pages will not ignore writepage
but I assume that is still ok.

One potential caveat is if wakeup_flusher_threads() is ignored because
there is a reliance in reclaim that if all the pages at the tail of the
LRU are dirty pages not queued for writeback then wakeup_flusher_threads()
will do something so pages get marked for immediate reclaim when writeback
completes. Of course there is no guarantee that flusher threads will start
writeback on the old pages and the pages could be backed by a slow BDI
but wakeup_flusher_threads() should not be ignored.

Another caveat is that comments become misleading. Take for example the
comment "Only kswapd can writeback filesystem folios to avoid risk of
stack overflow." The wording should change to note that writepage may
do nothing at all.  There also might need to be some adjustment on when
pages get marked for immediate reclaim when they are dirty but not under
writeback. pageout() might need a tracepoint for "mapping->a_ops->writepage
== NULL" to help debug problems around a failure to queue pages for writback
although that could be done as a test patch for a bug.

There would need to be some changes made if writepage always or often failed
and there might be some premature throttling based on "NOPROGRESS" on small
systems due to dirty-not-writepage pages at the tail of the LRU but I don't
think it would be an immediate disaster, Reclaim throttling is no longer
based on the ability to queue for writeback or "congestion" state of BDIs
and some care is taken to not prematurely stall on NOPROGRESS. However,
if there was a bug related to premature stalls or excessive CPU usage
from direct reclaimers or kswapd that bisected to a change in writepage
then it should be fixed on the vmscan-side to put an emphasis on handling
"reclaim is in trouble when the bulk of reclaimable pages are FS-only,
dirty and not under writeback but ->writepage is a no-op".

-- 
Mel Gorman
SUSE Labs