Re: [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages

Luiz Capitulino <luizcap@xxxxxxxxxx> · Tue, 11 Feb 2025 09:51:28 -0500

On 2025-02-11 04:06, Oscar Salvador wrote:
On Mon, Feb 10, 2025 at 10:48:56PM -0500, Luiz Capitulino wrote:
When using the HugeTLB kernel command-line to allocate 1G pages from
a specific node, such as:

    default_hugepagesz=1G hugepages=1:1

If node 1 happens to not have enough memory for the requested number of
1G pages, the allocation falls back to other nodes. A quick way to
reproduce this is by creating a KVM guest with a memory-less node and
trying to allocate 1 1G page from it. Instead of failing, the allocation
will fallback to other nodes.

This defeats the purpose of node specific allocation. Also, specific
node allocation for 2M pages don't have this behavior: the allocation
will just fail for the pages it can't satisfy.

This issue happens because HugeTLB calls memblock_alloc_try_nid_raw()
for 1G boot-time allocation as this function falls back to other nodes
if the allocation can't be satisfied. Use memblock_alloc_exact_nid_raw()
instead, which ensures that the allocation will only be satisfied from
the specified node.

Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")

Signed-off-by: Luiz Capitulino <luizcap@xxxxxxxxxx>

Acked-by: Oscar Salvador <osalvador@xxxxxxx>

This was discussed yesterday in [1], ccing Frank for awareness.

[1] https://patchwork.kernel.org/project/linux-mm/patch/20250206185109.1210657-6-fvdl@xxxxxxxxxx/

Interesting, thanks for the reference.

I stumbled over this issue back in December when debugging a HugeTLB issue
at Red Hat (David knows it ;) ) and had this patch pending for more than a
week now...