Re: [RFC PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent

Oscar Salvador <osalvador@xxxxxxx> · Fri, 23 Feb 2024 15:19:34 +0100

On Fri, Feb 23, 2024 at 10:56:48AM +0800, Baolin Wang wrote:

> I previously considered passing the MR_reason argument to the
> htlb_modify_alloc_mask(), which is only used by hugetlb migration.
> But in alloc_hugetlb_folio_nodemask(), if there are available hugetlb on
> other nodes, we should allow migrating, that will not break the per-node
> hugetlb pool.
> 
> That's why I just change the gfp_mask for allocating a new hguetlb when
> migration, that can break the pool.

Code-wise I think this is good, but I'm having some feelings
about where filter out the mask.
Ok, I'm trying to get my head around this.
It's been a while since I looked into hugetlb code, so here we go.

You mentioned that the only reason not to fiddle with gfp_mask before calling
in alloc_hugetlb_folio_nodemask(), was that we might be able to find a hugetlb
page in another node, and that that's ok because since all nodes remain with
the same number of hugetlb pages, per-node pool doesn't get broken.

Now, I see that dequeue_hugetlb_folio_nodemask() first tries to get the zonelist
of the preferred node, and AFAICS, if it has !GFP_THISNODE, it should also
get the zonelists of all other nodes, so we might fallback.

In the hope of finding a way to be able to filter out in htlb_modify_alloc_mask(),
I was trying to see whether we could skip GFP_THISNODE in
dequeue_hugetlb_folio_nodemask() but no because we might end up dequeueing
a hugetlb which sits in another node, while we really specified __GFP_THISNODE.

The only way might be to somehow decouple dequeue_hugetlb_folio_nodemask()
from alloc_hugetlb_folio_nodemask() and do some kind of gfp modification
between the two calls.

Another thing I dislike is the "-1" in alloc_hugetlb_folio_vma().
I think at least it deserves a comment like "Passing -1 will make us stick
to GFP_THISNODE".
Although that is another thing, we will pass "-1" which forces GFP_THISNODE
when allocating a newly fresh hugetlb page, but in dequeue_hugetlb_folio_nodemask()
we might get a page from a different node.
That doesn't break per-node pool, but it is somehow odd?

-- 
Oscar Salvador
SUSE Labs