On Tue, 28 Apr 2020, Vlastimil Babka wrote: > > I took a look at doing a quick-fix for the > > direct-reclaimers-get-their-stuff-stolen issue about a million years > > ago. I don't recall where it ended up. It's pretty trivial for the > > direct reclaimer to free pages into current->reclaimed_pages and to > > take a look in there on the allocation path, etc. But it's only > > practical for order-0 pages. > > FWIW there's already such approach added to compaction by Mel some time ago, > so order>0 allocations are covered to some extent. But in this case I imagine > that compaction won't even start because order-0 watermarks are too low. > > The order-0 reclaim capture might work though - as a result the GFP_ATOMIC > allocations would more likely fail and defer to their fallback context. > Yes, order-0 reclaim capture is interesting since the issue being reported here is userspace going out to lunch because it loops for an unbounded amount of time trying to get above a watermark where it's allowed to allocate and other consumers are depleting that resource. We actually prefer to oom kill earlier rather than being put in a perpetual state of aggressive reclaim that affects all allocators and the unbounded nature of those allocations leads to very poor results for everybody. I'm happy to scope this solely to an order-0 reclaim capture. I'm not sure if I'm clear on whether this has been worked on before and patches existed in the past? Somewhat related to what I described in the changelog: we lost the "page allocation stalls" artifacts in the kernel log for 4.15. The commit description references an asynchronous mechanism for getting this information; I don't know where this mechanism currently lives.