Re: [PATCH 0/2] fix for "pathological THP behavior"

"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> · Mon, 20 Aug 2018 14:58:18 +0300

On Sun, Aug 19, 2018 at 11:22:02PM -0400, Andrea Arcangeli wrote:
> Hello,
> 
> we detected a regression compared to older kernels, only happening
> with defrag=always or by using MADV_HUGEPAGE (and QEMU uses it).
> 
> I haven't bisected but I suppose this started since commit
> 5265047ac30191ea24b16503165000c225f54feb combined with previous
> commits that introduced the logic to not try to invoke reclaim for THP
> allocations in the remote nodes.
> 
> Once I looked into it the problem was pretty obvious and there are two
> possible simple fixes, one is not to invoke reclaim and stick to
> compaction in the local node only (still __GFP_THISNODE model).
> 
> This approach keeps the logic the same and prioritizes for NUMA
> locality over THP generation.
> 
> Then I'll send the an alternative that drops the __GFP_THISNODE logic
> if_DIRECT_RECLAIM is set. That however changes the behavior for
> MADV_HUGEPAGE and prioritizes THP generation over NUMA locality.
> 
> A possible incremental improvement for this __GFP_COMPACT_ONLY
> solution would be to remove __GFP_THISNODE (and in turn
> __GFP_COMPACT_ONLY) after checking the watermarks if there's no free
> PAGE_SIZEd memory in the local node. However checking the watermarks
> in mempolicy.c is not ideal so it would be a more messy change and
> it'd still need to use __GFP_COMPACT_ONLY as implemented here for when
> there's no PAGE_SIZEd free memory in the local node. That further
> improvement wouldn't be necessary if there's agreement to prioritize
> THP generation over NUMA locality (the alternative solution I'll send
> in a separate post).

I personally prefer to prioritize NUMA locality over THP
(__GFP_COMPACT_ONLY variant), but I don't know page-alloc/compaction good
enough to Ack it.

-- 
 Kirill A. Shutemov