Hello, we detected a regression compared to older kernels, only happening with defrag=always or by using MADV_HUGEPAGE (and QEMU uses it). I haven't bisected but I suppose this started since commit 5265047ac30191ea24b16503165000c225f54feb combined with previous commits that introduced the logic to not try to invoke reclaim for THP allocations in the remote nodes. Once I looked into it the problem was pretty obvious and there are two possible simple fixes, one is not to invoke reclaim and stick to compaction in the local node only (still __GFP_THISNODE model). This approach keeps the logic the same and prioritizes for NUMA locality over THP generation. Then I'll send the an alternative that drops the __GFP_THISNODE logic if_DIRECT_RECLAIM is set. That however changes the behavior for MADV_HUGEPAGE and prioritizes THP generation over NUMA locality. A possible incremental improvement for this __GFP_COMPACT_ONLY solution would be to remove __GFP_THISNODE (and in turn __GFP_COMPACT_ONLY) after checking the watermarks if there's no free PAGE_SIZEd memory in the local node. However checking the watermarks in mempolicy.c is not ideal so it would be a more messy change and it'd still need to use __GFP_COMPACT_ONLY as implemented here for when there's no PAGE_SIZEd free memory in the local node. That further improvement wouldn't be necessary if there's agreement to prioritize THP generation over NUMA locality (the alternative solution I'll send in a separate post). Andrea Arcangeli (2): mm: thp: consolidate policy_nodemask call mm: thp: fix transparent_hugepage/defrag = madvise || always include/linux/gfp.h | 18 ++++++++++++++++++ mm/mempolicy.c | 16 +++++++++++++--- mm/page_alloc.c | 4 ++++ 3 files changed, 35 insertions(+), 3 deletions(-)