Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 29-08-18 18:54:23, Zi Yan wrote:
[...]
> I tested it against Linus’s tree with “memhog -r3 130g” in a two-socket machine with 128GB memory on
> each node and got the results below. I expect this test should fill one node, then fall back to the other.
> 
> 1. madvise(MADV_HUGEPAGE) + defrag = {always, madvise, defer+madvise}:
> no swap, THPs are allocated in the fallback node.
> 2. madvise(MADV_HUGEPAGE) + defrag = defer: pages got swapped to the
> disk instead of being allocated in the fallback node.
> 3. no madvise, THP is on by default + defrag = {always, defer,
> defer+madvise}: pages got swapped to the disk instead of being
> allocated in the fallback node.
> 4. no madvise, THP is on by default + defrag = madvise: no swap, base
> pages are allocated in the fallback node.
> 
> The result 2 and 3 seems unexpected, since pages should be allocated in the fallback node.
> 
> The reason, as Andrea mentioned in his email, is that the combination
> of __THIS_NODE and __GFP_DIRECT_RECLAIM (plus __GFP_KSWAPD_RECLAIM
> from this experiment).

But we do not set __GFP_THISNODE along with __GFP_DIRECT_RECLAIM AFAICS.
We do for __GFP_KSWAPD_RECLAIM though and I guess that it is expected to
see kswapd do the reclaim to balance the node. If the node is full of
anonymous pages then there is no other way than swap out.

> __THIS_NODE uses ZONELIST_NOFALLBACK, which
> removes the fallback possibility and __GFP_*_RECLAIM triggers page
> reclaim in the first page allocation node when fallback nodes are
> removed by ZONELIST_NOFALLBACK.

Yes but the point is that the allocations which use __GFP_THISNODE are
optimistic so they shouldn't fallback to remote NUMA nodes.

> IMHO, __THIS_NODE should not be used for user memory allocation at
> all, since it fights against most of memory policies.  But kernel
> memory allocation would need it as a kernel MPOL_BIND memory policy.

__GFP_THISNODE is indeed an ugliness. I would really love to get rid of
it here. But the problem is that optimistic THP allocations should
prefer a local node because a remote node might easily offset the
advantage of the THP. I do not have a great idea how to achieve that
without __GFP_THISNODE though.
-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux