Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"

David Rientjes <rientjes@xxxxxxxxxx> · Fri, 21 Jun 2019 14:16:41 -0700 (PDT)

On Thu, 6 Jun 2019, David Rientjes wrote:

> The idea that I had was snipped from this, however, and it would be nice 
> to get some feedback on it: I've suggested that direct reclaim for the 
> purposes of hugepage allocation on the local node is never worthwhile 
> unless and until memory compaction can both capture that page to use (not 
> rely on the freeing scanner to find it) and that migration of a number of 
> pages would eventually result in the ability to free a pageblock.
> 
> I'm hoping that we can all agree to that because otherwise it leads us 
> down a bad road if reclaim is doing pointless work (freeing scanner can't 
> find it or it gets allocated again before it can find it) or compaction 
> can't make progress as a result of it (even though we can migrate, it 
> still won't free a pageblock).
> 
> In the interim, I think we should suppress direct reclaim entirely for 
> thp allocations, regardless of enabled=always or MADV_HUGEPAGE because it 
> cannot be proven that the reclaim work is beneficial and I believe it 
> results in the swap storms that are being reported.
> 
> Any disagreements so far?
> 
> Furthermore, if we can agree to that, memory compaction when allocating a 
> transparent hugepage fails for different reasons, one of which is because 
> we fail watermark checks because we lack migration targets.  This is 
> normally what leads to direct reclaim.  Compaction is *supposed* to return 
> COMPACT_SKIPPED for this but that's overloaded as well: it happens when we 
> fail extfrag_threshold checks and wheng gfp flags doesn't allow it.  The 
> former matters for thp.
> 
> So my proposed change would be:
>  - give the page allocator a consistent indicator that compaction failed
>    because we are low on memory (make COMPACT_SKIPPED really mean this),
>  - if we get this in the page allocator and we are allocating thp, fail,
>    reclaim is unlikely to help here and is much more likely to be
>    disruptive
>      - we could retry compaction if we haven't scanned all memory and
>        were contended,
>  - if the hugepage allocation fails, have thp check watermarks for order-0 
>    pages without any padding,
>  - if watermarks succeed, fail the thp allocation: we can't allocate
>    because of fragmentation and it's better to return node local memory,
>  - if watermarks fail, a follow up allocation of the pte will likely also
>    fail, so thp retries the allocation with a cleared  __GFP_THISNODE.
> 
> This doesn't sound very invasive and I'll code it up if it will be tested.
> 

Following up on this since there has been no activity in a week, I am 
happy to prototype this.  Andrea, would you be able to test a patch once 
it is ready for you to try?