Re: [PATCH v1] mm/khugepaged: replace page_mapcount() check by folio_likely_mapped_shared()

John Hubbard <jhubbard@xxxxxxxxxx> · Wed, 24 Apr 2024 21:00:50 -0700

On 4/24/24 5:26 AM, David Hildenbrand wrote:

Hi David,

Overall, I think this looks good, just a few questions, and of course
some silly documentation nits.


We want to limit the use of page_mapcount() to places where absolutely
required, to prepare for kernel configs where we won't keep track of
per-page mapcounts in large folios.


Just curious, can you elaborate on the motivation? I probably missed
the discussions that explained why page_mapcount() in large folios
is not desirable. Are we getting rid of a field in struct page/folio?
Some other reason?

...
To summarize, in the common case, this change is not expected to matter
much. The more common application of khugepaged operates on

Based on the diffs (and some quick hacks for testing that I ran), I agree.

...

This really needs the folio_likely_mapped_shared() optimization [1] that
resides in mm-unstable, I think, to reduce "false negatives".

The khugepage MM selftests keep working as expected, including:

	Run test: collapse_max_ptes_shared (khugepaged:anon)
	Allocate huge page... OK
	Share huge page over fork()... OK
	Trigger CoW on page 255 of 512... OK
	Maybe collapse with max_ptes_shared exceeded.... OK
	Trigger CoW on page 256 of 512... OK
	Collapse with max_ptes_shared PTEs shared.... OK
	Check if parent still has huge page... OK

Well, a word of caution! These tests do not (yet) cover either of
the interesting new cases that folio_likely_mapped_shared() presents:
KSM or hugetlbfs interactions. In other words, false positives.



Where we check that collapsing in the parent behaves as expected after
COWing a lot of pages in the parent: a sane scenario that is essentially
unchanged and which does not depend on any action in the child process
(compared to the cases discussed in (B) above).

[1] https://lkml.kernel.org/r/20240409192301.907377-6-david@xxxxxxxxxx

---
  Documentation/admin-guide/mm/transhuge.rst |  3 ++-
  mm/khugepaged.c                            | 22 +++++++++++++++-------
  2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index f82300b9193fe..076443cc10a6c 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -278,7 +278,8 @@ collapsed, resulting fewer pages being collapsed into
  THPs, and lower memory access performance.
  
  ``max_ptes_shared`` specifies how many pages can be shared across multiple
-processes. Exceeding the number would block the collapse::
+processes. khugepaged might treat pages of THPs as shared if any page of
+that THP is shared. Exceeding the number would block the collapse::
  
  	/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
  
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 2f73d2aa9ae84..cf518fc440982 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -583,7 +583,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
  		folio = page_folio(page);
  		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
  
-		if (page_mapcount(page) > 1) {
+		/* See hpage_collapse_scan_pmd(). */

Why? Because it has an identical code snippet?

I thought about asking if we should factor that out, just to
keep the policy the same. Thoughts?

+		if (folio_likely_mapped_shared(folio)) {
  			++shared;
  			if (cc->is_khugepaged &&
  			    shared > khugepaged_max_ptes_shared) {
@@ -1317,8 +1318,20 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
  			result = SCAN_PAGE_NULL;
  			goto out_unmap;
  		}
+		folio = page_folio(page);
  
-		if (page_mapcount(page) > 1) {
+		if (!folio_test_anon(folio)) {
+			result = SCAN_PAGE_ANON;
+			goto out_unmap;
+		}
+
+		/*
+		 * We treat a single page as shared if any part of the THP
+		 * is shared. "False negatives" from
+		 * folio_likely_mapped_shared() are not expected to matter
+		 * much in practice.

Maybe delete that second sentence? It is not really pulling its
weight here. :)


thanks,
--
John Hubbard
NVIDIA