Axel Rasmussen writes:
A couple of dumb questions. In your test, do you have any of the following configured / enabled? /proc/sys/vm/laptop_mode memory.low memory.min
None of these are enabled. The issue is trivially reproducible by writing to any slow device with memory.max enabled, but from the code it looks like MGLRU is also susceptible to this on global reclaim (although it's less likely due to page diversity).
Besides that, it looks like the place non-MGLRU reclaim wakes up the flushers is in shrink_inactive_list() (which calls wakeup_flusher_threads()). Since MGLRU calls shrink_folio_list() directly (from evict_folios()), I agree it looks like it simply will not do this. Yosry pointed out [1], where MGLRU used to call this but stopped doing that. It makes sense to me at least that doing writeback every time we age is too aggressive, but doing it in evict_folios() makes some sense to me, basically to copy the behavior the non-MGLRU path (shrink_inactive_list()) has.
Thanks! We may also need reclaim_throttle(), depending on how you implement it. Current non-MGLRU behaviour on slow storage is also highly suspect in terms of (lack of) throttling after moving away from VMSCAN_THROTTLE_WRITEBACK, but one thing at a time :-)