Re: [PATCH] mm/hugetlb: try preferred node first when alloc gigantic page from cma

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Mon, 31 Aug 2020 14:44:40 -0700

On 8/30/20 7:04 AM, Li Xinhai wrote:
> Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic
> hugepages using cma"), the gigantic page would be allocated from node
> which is not the preferred node, although there are pages available from
> that node. The reason is that the nid parameter has been ignored in
> alloc_gigantic_page().
> 
> After this patch, the preferred node is tried first before other allowed
> nodes.

Thank you!
This is an issue that needs to be fixed.

> Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
> Cc: Roman Gushchin <guro@xxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxxxx>
> Signed-off-by: Li Xinhai <lixinhai.lxh@xxxxxxxxx>
> ---
>  mm/hugetlb.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a301c2d672bf..4a28b8853d47 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask,
>  		struct page *page;
>  		int node;
>  
> +		if (hugetlb_cma[nid]) {
> +			page = cma_alloc(hugetlb_cma[nid], nr_pages,
> +					huge_page_order(h), true);
> +			if (page)
> +				return page;
> +		}
> +

When looking at your changes, I noticed that this code for allocation
from CMA does not take gfp_mask into account.  The 'normal' use case
is to allocate pool pages with something similar to:

echo 16 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

The routine alloc_pool_huge_page will try to interleave pages among nodes:

	...
        gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;

        for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
	...

which will eventually call alloc_gigantic_page.  If __GFP_THISNODE is
set we really do not want to execute the below for loop in alloc_gigantic_page.

I think the convention in the mm code is that only the lowest level
allocation routines should interpret the GFP flags.  We may need to make
an exception here and check for __GFP_THISNODE.

Michal would be the best person to comment and perhaps make a recommendation.

-- 
Mike Kravetz

>  		for_each_node_mask(node, *nodemask) {
> -			if (!hugetlb_cma[node])
> +			if (node == nid || !hugetlb_cma[node])
>  				continue;
>  
>  			page = cma_alloc(hugetlb_cma[node], nr_pages,
>