On Tue, 21 Aug 2018, Vlastimil Babka wrote: > Frankly, I would rather go with this option and assume that if someone > explicitly wants THP's, they don't care about NUMA locality that much. > (Note: I hate __GFP_THISNODE, it's an endless source of issues.) > Trying to be clever about "is there still PAGE_SIZEd free memory in the > local node" is imperfect anyway. If there isn't, is it because there's > clean page cache that we can easily reclaim (so it would be worth > staying local) or is it really exhausted? Watermark check won't tell... > MADV_HUGEPAGE (or defrag == "always") would now become a combination of "try to compact locally" and "allocate remotely if necesary" without the ability to avoid the latter absent a mempolicy that affects all memory allocations. I think the complete solution would be a MPOL_F_HUGEPAGE flag that defines mempolicies for hugepage allocations. In my experience thp falling back to remote nodes for intrasocket latency is a win but intersocket or two-hop intersocket latency is a no go.