Re: [PATCH v1 07/11] mm/huge_memory: convert split_huge_pages_pid() from follow_page() to folio_walk

David Hildenbrand <david@xxxxxxxxxx> · Wed, 7 Aug 2024 11:57:19 +0200

On 06.08.24 17:36, Zi Yan wrote:
On 6 Aug 2024, at 6:24, David Hildenbrand wrote:

On 06.08.24 12:03, David Hildenbrand wrote:
On 06.08.24 11:56, David Hildenbrand wrote:
On 06.08.24 11:46, Ryan Roberts wrote:
On 02/08/2024 16:55, David Hildenbrand wrote:
Let's remove yet another follow_page() user. Note that we have to do the
split without holding the PTL, after folio_walk_end(). We don't care
about losing the secretmem check in follow_page().

Hi David,

Our (arm64) CI is showing a regression in split_huge_page_test from mm selftests from next-20240805 onwards. Navigating around a couple of other lurking bugs, I was able to bisect to this change (which smells about right).

Newly failing test:

# # ------------------------------
# # running ./split_huge_page_test
# # ------------------------------
# # TAP version 13
# # 1..12
# # Bail out! Still AnonHugePages not split
# # # Planned tests != run tests (12 != 0)
# # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
# # [FAIL]
# not ok 52 split_huge_page_test # exit=1

It's trying to split some pmd-mapped THPs then checking and finding that they are not split. The split is requested via /sys/kernel/debug/split_huge_pages, which I believe ends up in this function you are modifying here. Although I'll admit that looking at the change, there is nothing obviously wrong! Any ideas?

Nothing jumps at me as well. Let me fire up the debugger :)

Ah, very likely the can_split_folio() check expects a raised refcount
already.

Indeed, the following does the trick! Thanks Ryan, I could have sworn
I ran that selftest as well.

TAP version 13
1..12
ok 1 Split huge pages successful
ok 2 Split PTE-mapped huge pages successful
# Please enable pr_debug in split_huge_pages_in_file() for more info.
# Please check dmesg for more information
ok 3 File-backed THP split test done

...


@Andrew, can you squash the following?


 From e5ea585de3e089ea89bf43d8447ff9fc9b371286 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@xxxxxxxxxx>
Date: Tue, 6 Aug 2024 12:08:17 +0200
Subject: [PATCH] fixup: mm/huge_memory: convert split_huge_pages_pid() from
  follow_page() to folio_walk

We have to teach can_split_folio() that we are not holding an additional
reference.

Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
  include/linux/huge_mm.h | 4 ++--
  mm/huge_memory.c        | 8 ++++----
  mm/vmscan.c             | 2 +-
  3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e25d9ebfdf89..ce44caa40eed 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -314,7 +314,7 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
  		unsigned long len, unsigned long pgoff, unsigned long flags,
  		vm_flags_t vm_flags);
  -bool can_split_folio(struct folio *folio, int *pextra_pins);
+bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
  int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
  		unsigned int new_order);
  static inline int split_huge_page(struct page *page)
@@ -470,7 +470,7 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
  }
   static inline bool
-can_split_folio(struct folio *folio, int *pextra_pins)
+can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
  {
  	return false;
  }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 697fcf89f975..c40b0dcc205b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3021,7 +3021,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
  }
   /* Racy check whether the huge page can be split */
-bool can_split_folio(struct folio *folio, int *pextra_pins)
+bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
  {
  	int extra_pins;
  @@ -3033,7 +3033,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins)
  		extra_pins = folio_nr_pages(folio);
  	if (pextra_pins)
  		*pextra_pins = extra_pins;
-	return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1;
+	return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - caller_pins;
  }
   /*
@@ -3201,7 +3201,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
  	 * Racy check if we can split the page, before unmap_folio() will
  	 * split PMDs
  	 */
-	if (!can_split_folio(folio, &extra_pins)) {
+	if (!can_split_folio(folio, 1, &extra_pins)) {
  		ret = -EAGAIN;
  		goto out_unlock;
  	}
@@ -3537,7 +3537,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
  		 * can be split or not. So skip the check here.
  		 */
  		if (!folio_test_private(folio) &&
-		    !can_split_folio(folio, NULL))
+		    !can_split_folio(folio, 0, NULL))
  			goto next;
   		if (!folio_trylock(folio))

The diff below can skip a folio with private and extra pin(s) early instead
of trying to lock and split it then failing at can_split_folio() inside
split_huge_page_to_list_to_order().

Maybe worth applying on top of yours?


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a218320a9233..ce992d54f1da 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3532,13 +3532,10 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
                         goto next;

                 total++;
-               /*
-                * For folios with private, split_huge_page_to_list_to_order()
-                * will try to drop it before split and then check if the folio
-                * can be split or not. So skip the check here.
-                */
-               if (!folio_test_private(folio) &&
-                   !can_split_folio(folio, 0, NULL))
+
+               if (!can_split_folio(folio,
+                                    folio_test_private(folio) ? 1 : 0,
+                                    NULL))

Hmm, it does look a bit odd. It's not something from the caller (caller_pins), but a
folio property. Likely should be handled differently.

In vmscan code, we only call can_split_folio() on anon folios where
folio_test_private() does not apply.

But indeed, in split_huge_page_to_list_to_order() we'd have to fail if
folio_test_private() still applies after

Not sure if that is really better:


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c40b0dcc205b..7cb743047566 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3026,11 +3026,14 @@ bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
        int extra_pins;
 
        /* Additional pins from page cache */
-       if (folio_test_anon(folio))
+       if (folio_test_anon(folio)) {
                extra_pins = folio_test_swapcache(folio) ?
                                folio_nr_pages(folio) : 0;
-       else
+       } else {
                extra_pins = folio_nr_pages(folio);
+               if (unlikely(folio_test_private(folio)))
+                       extra_pins++;
+       }
        if (pextra_pins)
                *pextra_pins = extra_pins;
        return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - caller_pins;
@@ -3199,9 +3202,11 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 
        /*
         * Racy check if we can split the page, before unmap_folio() will
-        * split PMDs
+        * split PMDs. filemap_release_folio() will try to free buffer; if that
+        * fails, filemap_release_folio() fails.
         */
-       if (!can_split_folio(folio, 1, &extra_pins)) {
+       if (WARN_ON_ONCE(folio_test_private(folio)) ||
+           !can_split_folio(folio, 1, &extra_pins)) {
                ret = -EAGAIN;
                goto out_unlock;
        }
@@ -3531,13 +3536,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
                        goto next;
 
                total++;
-               /*
-                * For folios with private, split_huge_page_to_list_to_order()
-                * will try to drop it before split and then check if the folio
-                * can be split or not. So skip the check here.
-                */
-               if (!folio_test_private(folio) &&
-                   !can_split_folio(folio, 0, NULL))
+               if (!can_split_folio(folio, 0, NULL))
                        goto next;
 
                if (!folio_trylock(folio))


It assumes that folio_set_private() is impossible after filemap_release_folio() succeeded
and we're still holding the folio lock. Then we could even get rid of the WARN_ON_ONCE().

--
Cheers,

David / dhildenb