On Wed, Aug 22, 2018 at 11:02:14AM +0200, Michal Hocko wrote: > I am not disputing the bug itself. How hard should defrag=allways really > try is good question and I would say different people would have > different ideas but a swapping storm sounds like genuinely unwanted > behavior. I would expect that to be handled in the reclaim/compaction. > GFP_TRANSHUGE doesn't have ___GFP_RETRY_MAYFAIL so it shouldn't really > try too hard to reclaim. Everything was ok as long as the THP allocation is not bind to the local node with __GFP_THISNODE. Calling reclaim to free memory is part of how compaction works if all free memory has been extinguished from all nodes. At that point it's much more likely compaction fails not because there's not at least 2m free but because of all memory is fragmented. So it's true that MADV_HUGEPAGE may run better on not-NUMA by not setting __GFP_COMPACT_ONLY though (i.e. like right now, __GFP_THISNODE would be a noop there). How hard defrag=always should try, I think it should at least call compaction once, so at least in the case there's plenty of free memory in the local node it'll have a chance. It sounds a sure win that way. Calling compaction with __GFP_THISNODE will at least defrag all free memory, it'll give MADV_HUGEPAGE a chance. > I still have to digest the __GFP_THISNODE thing but I _think_ that the > alloc_pages_vma code is just trying to be overly clever and > __GFP_THISNODE is not a good fit for it. My option 2 did just that, it removed __GFP_THISNODE but only for MADV_HUGEPAGE and in general whenever reclaim was activated by __GFP_DIRECT_RECLAIM. That is also signal that the user really wants THP so then it's less bad to prefer THP over NUMA locality. For the default which is tuned for short lived allocation, preferring local memory is most certainly better win for short lived allocation where THP can't help much, this is why I didn't remove __GFP_THISNODE from the default defrag policy. Thanks, Andrea