On Thu, 6 Jun 2019, David Rientjes wrote: > The idea that I had was snipped from this, however, and it would be nice > to get some feedback on it: I've suggested that direct reclaim for the > purposes of hugepage allocation on the local node is never worthwhile > unless and until memory compaction can both capture that page to use (not > rely on the freeing scanner to find it) and that migration of a number of > pages would eventually result in the ability to free a pageblock. > > I'm hoping that we can all agree to that because otherwise it leads us > down a bad road if reclaim is doing pointless work (freeing scanner can't > find it or it gets allocated again before it can find it) or compaction > can't make progress as a result of it (even though we can migrate, it > still won't free a pageblock). > > In the interim, I think we should suppress direct reclaim entirely for > thp allocations, regardless of enabled=always or MADV_HUGEPAGE because it > cannot be proven that the reclaim work is beneficial and I believe it > results in the swap storms that are being reported. > > Any disagreements so far? > > Furthermore, if we can agree to that, memory compaction when allocating a > transparent hugepage fails for different reasons, one of which is because > we fail watermark checks because we lack migration targets. This is > normally what leads to direct reclaim. Compaction is *supposed* to return > COMPACT_SKIPPED for this but that's overloaded as well: it happens when we > fail extfrag_threshold checks and wheng gfp flags doesn't allow it. The > former matters for thp. > > So my proposed change would be: > - give the page allocator a consistent indicator that compaction failed > because we are low on memory (make COMPACT_SKIPPED really mean this), > - if we get this in the page allocator and we are allocating thp, fail, > reclaim is unlikely to help here and is much more likely to be > disruptive > - we could retry compaction if we haven't scanned all memory and > were contended, > - if the hugepage allocation fails, have thp check watermarks for order-0 > pages without any padding, > - if watermarks succeed, fail the thp allocation: we can't allocate > because of fragmentation and it's better to return node local memory, > - if watermarks fail, a follow up allocation of the pte will likely also > fail, so thp retries the allocation with a cleared __GFP_THISNODE. > > This doesn't sound very invasive and I'll code it up if it will be tested. > Following up on this since there has been no activity in a week, I am happy to prototype this. Andrea, would you be able to test a patch once it is ready for you to try?