+ hugetlb-prioritize-surplus-allocation-from-current-node.patch added to mm-unstable branch

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 04 Dec 2024 13:52:35 -0800

The patch titled
     Subject: hugetlb: prioritize surplus allocation from current node
has been added to the -mm mm-unstable branch.  Its filename is
     hugetlb-prioritize-surplus-allocation-from-current-node.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/hugetlb-prioritize-surplus-allocation-from-current-node.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Koichiro Den <koichiro.den@xxxxxxxxxxxxx>
Subject: hugetlb: prioritize surplus allocation from current node
Date: Thu, 5 Dec 2024 01:55:03 +0900

Previously, surplus allocations triggered by mmap were typically made from
the node where the process was running.  On a page fault, the area was
reliably dequeued from the hugepage_freelists for that node.  However,
since commit 003af997c8a9 ("hugetlb: force allocating surplus hugepages on
mempolicy allowed nodes"), dequeue_hugetlb_folio_vma() may fall back to
other nodes unnecessarily even if there is no MPOL_BIND policy, causing
folios to be dequeued from nodes other than the current one.

Also, allocating from the node where the current process is running is
likely to result in a performance win, as mmap-ing processes often touch
the area not so long after allocation.  This change minimizes surprises
for users relying on the previous behavior while maintaining the benefit
introduced by the commit.

So, prioritize the node the current process is running on when possible.

Link: https://lkml.kernel.org/r/20241204165503.628784-1-koichiro.den@xxxxxxxxxxxxx
Signed-off-by: Koichiro Den <koichiro.den@xxxxxxxxxxxxx>
Cc: Aristeu Rozanski <aris@xxxxxxxxxx>
Cc: Aristeu Rozanski <aris@xxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Muchun Song <muchun.song@xxxxxxxxx>
Cc: Vishal Moola (Oracle) <vishal.moola@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/hugetlb.c |   20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

--- a/mm/hugetlb.c~hugetlb-prioritize-surplus-allocation-from-current-node
+++ a/mm/hugetlb.c
@@ -2463,7 +2463,13 @@ static int gather_surplus_pages(struct h
 	long needed, allocated;
 	bool alloc_ok = true;
 	int node;
-	nodemask_t *mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
+	nodemask_t *mbind_nodemask, alloc_nodemask;
+
+	mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
+	if (mbind_nodemask)
+		nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed);
+	else
+		alloc_nodemask = cpuset_current_mems_allowed;
 
 	lockdep_assert_held(&hugetlb_lock);
 	needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
@@ -2479,8 +2485,16 @@ retry:
 	spin_unlock_irq(&hugetlb_lock);
 	for (i = 0; i < needed; i++) {
 		folio = NULL;
-		for_each_node_mask(node, cpuset_current_mems_allowed) {
-			if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) {
+
+		/* Prioritize current node */
+		if (node_isset(numa_mem_id(), alloc_nodemask))
+			folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
+					numa_mem_id(), NULL);
+
+		if (!folio) {
+			for_each_node_mask(node, alloc_nodemask) {
+				if (node == numa_mem_id())
+					continue;
 				folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
 						node, NULL);
 				if (folio)
_

Patches currently in -mm which might be from koichiro.den@xxxxxxxxxxxxx are

hugetlb-prioritize-surplus-allocation-from-current-node.patch