On Fri, 21 Jun 2024 17:56:09 -0700 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, 21 Jun 2024 15:00:50 -0400 Aristeu Rozanski <aris@xxxxxxxxxx> wrote: > > > allowed_mems_nr() only accounts for the number of free hugepages in the nodes > > the current process belongs to. In case one or more of the requested surplus > > hugepages are allocated in a different node, the whole allocation will fail due > > allowed_mems_nr() returning a lower value. > > > > So allocate surplus hugepages in one of the nodes the current process belongs to. > > > > Easy way to reproduce this issue is to use a 2+ NUMA nodes system: > > > > # echo 0 >/proc/sys/vm/nr_hugepages > > # echo 1 >/proc/sys/vm/nr_overcommit_hugepages > > # numactl -m0 ./tools/testing/selftests/mm/map_hugetlb 2 > > > > will eventually fail when the hugepage ends up allocated in a different node. > > > > Should we backport this into -stable kernels? Please? Aristeu, a description of the userspace-visible effects of the bug will really help things along here.