+ mm-hugetlb-update-nr_huge_pages-and-surplus_huge_pages-together.patch added to mm-unstable branch

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 05 Mar 2025 15:40:59 -0800

The patch titled
     Subject: mm/hugetlb: update nr_huge_pages and surplus_huge_pages together
has been added to the -mm mm-unstable branch.  Its filename is
     mm-hugetlb-update-nr_huge_pages-and-surplus_huge_pages-together.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-hugetlb-update-nr_huge_pages-and-surplus_huge_pages-together.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Liu Shixin <liushixin2@xxxxxxxxxx>
Subject: mm/hugetlb: update nr_huge_pages and surplus_huge_pages together
Date: Wed, 5 Mar 2025 11:54:09 +0800

In alloc_surplus_hugetlb_folio(), we increase nr_huge_pages and
surplus_huge_pages separately.  In the middle window, if we set
nr_hugepages to smaller and satisfy count < persistent_huge_pages(h), the
surplus_huge_pages will be increased by adjust_pool_surplus().

After adding delay in the middle window, we can reproduce the problem
easily by following step:

 1. echo 3 > /proc/sys/vm/nr_overcommit_hugepages
 2. mmap two hugepages. When nr_huge_pages=2 and surplus_huge_pages=1,
    goto step 3.
 3. echo 0 > /proc/sys/vm/nr_huge_pages

Finally, nr_huge_pages is less than surplus_huge_pages.

To fix the problem, call only_alloc_fresh_hugetlb_folio() instead and
move down __prep_account_new_huge_page() into the hugetlb_lock.

Link: https://lkml.kernel.org/r/20250305035409.2391344-1-liushixin2@xxxxxxxxxx
Fixes: 0c397daea1d4 ("mm, hugetlb: further simplify hugetlb allocation API")
Signed-off-by: Liu Shixin <liushixin2@xxxxxxxxxx>
Acked-by: Peter Xu <peterx@xxxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>
Cc: Liu Shixin <liushixin2@xxxxxxxxxx>
Cc: Muchun Song <muchun.song@xxxxxxxxx>
Cc: Oscar Salvador <osalvador@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/hugetlb.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/mm/hugetlb.c~mm-hugetlb-update-nr_huge_pages-and-surplus_huge_pages-together
+++ a/mm/hugetlb.c
@@ -2259,12 +2259,21 @@ static struct folio *alloc_surplus_huget
 		goto out_unlock;
 	spin_unlock_irq(&hugetlb_lock);
 
-	folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask);
+	folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
 	if (!folio)
 		return NULL;
 
+	hugetlb_vmemmap_optimize_folio(h, folio);
+
 	spin_lock_irq(&hugetlb_lock);
 	/*
+	 * nr_huge_pages needs to be adjusted within the same lock cycle
+	 * as surplus_pages, otherwise it might confuse
+	 * persistent_huge_pages() momentarily.
+	 */
+	__prep_account_new_huge_page(h, nid);
+
+	/*
 	 * We could have raced with the pool size change.
 	 * Double check that and simply deallocate the new page
 	 * if we would end up overcommiting the surpluses. Abuse
_

Patches currently in -mm which might be from liushixin2@xxxxxxxxxx are

mm-page_isolation-avoid-call-folio_hstate-without-hugetlb_lock.patch
mm-hugetlb-update-nr_huge_pages-and-surplus_huge_pages-together.patch