Re: Patch "mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases" has been added to the 6.11-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/17/24 1:47 PM, gregkh@xxxxxxxxxxxxxxxxxxx wrote:

This is a note to let you know that I've just added the patch titled

     mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases

to the 6.11-stable tree which can be found at:
     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
      mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch
and it can be found in the queue-6.11 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.


Hi,

During some related testing today, I experienced a kernel crash, and git
bisect points directly to the upstream commit corresponding to this
patch.

Specifically, when booting on x86 with "numa=fake=2 movablecore=4G" on
Linux 6.12, and running this, I get the crash shown below:

    tools/testing/selftests/mm/gup_longterm

So I think this is not stable material after all, and in fact the
upstream commit may even have to be reverted if I don't see what's wrong
with it very quickly.

So let's please drop the 6.11.y backport.


BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 176aef067 P4D 176aef067 PUD 105e63067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 UID: 0 PID: 1186 Comm: gup_longterm Not tainted 6.12.0-hubbard-github #192
Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
Code: 00 00 16 00 00 eb 1f 49 ff c9 4d 89 c8 49 f7 00 00 08 00 00 0f 84 81 02 00 00 48 ff c1 48 39 ce 0f 84 6f 02 00 00 48 8b 04 cf 0
RSP: 0018:ffffc90005483cd8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000160000000000 RSI: 0000000000000001 RDI: ffff888106b8e6e8
RBP: 00000000fffffff4 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000870a R11: 00000000000004fa R12: ffffea0010dc6dc0
R13: ffffea0010dc6dc8 R14: ffff888106b8e6e8 R15: 0000000000000001
FS:  00007f236b862740(0000) GS:ffff888499ac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000105158003 CR4: 00000000003726f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __die_body+0x66/0xb0
 ? page_fault_oops+0x30c/0x3b0
 ? do_user_addr_fault+0x6c3/0x720
 ? irqentry_enter+0x34/0x60
 ? exc_page_fault+0x68/0x100
 ? asm_exc_page_fault+0x22/0x30
 ? sanity_check_pinned_pages+0x3a/0x2d0
 unpin_user_pages+0x24/0xe0
 check_and_migrate_movable_pages_or_folios+0x455/0x4b0
 __gup_longterm_locked+0x3bf/0x820
 ? mmap_read_lock_killable+0x12/0x50
 ? __pfx_mmap_read_lock_killable+0x10/0x10
 pin_user_pages+0x66/0xa0
 gup_test_ioctl+0x358/0xb20
 __se_sys_ioctl+0x6b/0xc0
 do_syscall_64+0x7b/0x150
 entry_SYSCALL_64_after_hwframe+0x76/0x7e



 From 94efde1d15399f5c88e576923db9bcd422d217f2 Mon Sep 17 00:00:00 2001
From: John Hubbard <jhubbard@xxxxxxxxxx>
Date: Mon, 4 Nov 2024 19:29:44 -0800
Subject: mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases

From: John Hubbard <jhubbard@xxxxxxxxxx>

commit 94efde1d15399f5c88e576923db9bcd422d217f2 upstream.

commit 53ba78de064b ("mm/gup: introduce
check_and_migrate_movable_folios()") created a new constraint on the
pin_user_pages*() API family: a potentially large internal allocation must
now occur, for FOLL_LONGTERM cases.

A user-visible consequence has now appeared: user space can no longer pin
more than 2GB of memory anymore on x86_64.  That's because, on a 4KB
PAGE_SIZE system, when user space tries to (indirectly, via a device
driver that calls pin_user_pages()) pin 2GB, this requires an allocation
of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
kmalloc().

In addition to the directly visible effect described above, there is also
the problem of adding an unnecessary allocation.  The **pages array
argument has already been allocated, and there is no need for a redundant
**folios array allocation in this case.

Fix this by avoiding the new allocation entirely.  This is done by
referring to either the original page[i] within **pages, or to the
associated folio.  Thanks to David Hildenbrand for suggesting this
approach and for providing the initial implementation (which I've tested
and adjusted slightly) as well.

[jhubbard@xxxxxxxxxx: whitespace tweak, per David]
   Link: https://lkml.kernel.org/r/131cf9c8-ebc0-4cbb-b722-22fa8527bf3c@xxxxxxxxxx
[jhubbard@xxxxxxxxxx: bypass pofs_get_folio(), per Oscar]
   Link: https://lkml.kernel.org/r/c1587c7f-9155-45be-bd62-1e36c0dd6923@xxxxxxxxxx
Link: https://lkml.kernel.org/r/20241105032944.141488-2-jhubbard@xxxxxxxxxx
Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx>
Suggested-by: David Hildenbrand <david@xxxxxxxxxx>
Acked-by: David Hildenbrand <david@xxxxxxxxxx>
Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>
Cc: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
Cc: Dave Airlie <airlied@xxxxxxxxxx>
Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Cc: Jason Gunthorpe <jgg@xxxxxxxxxx>
Cc: Peter Xu <peterx@xxxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
Cc: Dongwon Kim <dongwon.kim@xxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Junxiao Chang <junxiao.chang@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
  mm/gup.c |  114 ++++++++++++++++++++++++++++++++++++++++++---------------------
  1 file changed, 77 insertions(+), 37 deletions(-)

--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2282,20 +2282,57 @@ struct page *get_dump_page(unsigned long
  #endif /* CONFIG_ELF_CORE */
#ifdef CONFIG_MIGRATION
+
+/*
+ * An array of either pages or folios ("pofs"). Although it may seem tempting to
+ * avoid this complication, by simply interpreting a list of folios as a list of
+ * pages, that approach won't work in the longer term, because eventually the
+ * layouts of struct page and struct folio will become completely different.
+ * Furthermore, this pof approach avoids excessive page_folio() calls.
+ */
+struct pages_or_folios {
+	union {
+		struct page **pages;
+		struct folio **folios;
+		void **entries;
+	};
+	bool has_folios;
+	long nr_entries;
+};
+
+static struct folio *pofs_get_folio(struct pages_or_folios *pofs, long i)
+{
+	if (pofs->has_folios)
+		return pofs->folios[i];
+	return page_folio(pofs->pages[i]);
+}
+
+static void pofs_clear_entry(struct pages_or_folios *pofs, long i)
+{
+	pofs->entries[i] = NULL;
+}
+
+static void pofs_unpin(struct pages_or_folios *pofs)
+{
+	if (pofs->has_folios)
+		unpin_folios(pofs->folios, pofs->nr_entries);
+	else
+		unpin_user_pages(pofs->pages, pofs->nr_entries);
+}
+
  /*
   * Returns the number of collected folios. Return value is always >= 0.
   */
  static unsigned long collect_longterm_unpinnable_folios(
-					struct list_head *movable_folio_list,
-					unsigned long nr_folios,
-					struct folio **folios)
+		struct list_head *movable_folio_list,
+		struct pages_or_folios *pofs)
  {
  	unsigned long i, collected = 0;
  	struct folio *prev_folio = NULL;
  	bool drain_allow = true;
- for (i = 0; i < nr_folios; i++) {
-		struct folio *folio = folios[i];
+	for (i = 0; i < pofs->nr_entries; i++) {
+		struct folio *folio = pofs_get_folio(pofs, i);
if (folio == prev_folio)
  			continue;
@@ -2336,16 +2373,15 @@ static unsigned long collect_longterm_un
   * Returns -EAGAIN if all folios were successfully migrated or -errno for
   * failure (or partial success).
   */
-static int migrate_longterm_unpinnable_folios(
-					struct list_head *movable_folio_list,
-					unsigned long nr_folios,
-					struct folio **folios)
+static int
+migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
+				   struct pages_or_folios *pofs)
  {
  	int ret;
  	unsigned long i;
- for (i = 0; i < nr_folios; i++) {
-		struct folio *folio = folios[i];
+	for (i = 0; i < pofs->nr_entries; i++) {
+		struct folio *folio = pofs_get_folio(pofs, i);
if (folio_is_device_coherent(folio)) {
  			/*
@@ -2353,7 +2389,7 @@ static int migrate_longterm_unpinnable_f
  			 * convert the pin on the source folio to a normal
  			 * reference.
  			 */
-			folios[i] = NULL;
+			pofs_clear_entry(pofs, i);
  			folio_get(folio);
  			gup_put_folio(folio, 1, FOLL_PIN);
@@ -2372,8 +2408,8 @@ static int migrate_longterm_unpinnable_f
  		 * calling folio_isolate_lru() which takes a reference so the
  		 * folio won't be freed if it's migrating.
  		 */
-		unpin_folio(folios[i]);
-		folios[i] = NULL;
+		unpin_folio(folio);
+		pofs_clear_entry(pofs, i);
  	}
if (!list_empty(movable_folio_list)) {
@@ -2396,12 +2432,26 @@ static int migrate_longterm_unpinnable_f
  	return -EAGAIN;
err:
-	unpin_folios(folios, nr_folios);
+	pofs_unpin(pofs);
  	putback_movable_pages(movable_folio_list);
return ret;
  }
+static long
+check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
+{
+	LIST_HEAD(movable_folio_list);
+	unsigned long collected;
+
+	collected = collect_longterm_unpinnable_folios(&movable_folio_list,
+						       pofs);
+	if (!collected)
+		return 0;
+
+	return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
+}
+
  /*
   * Check whether all folios are *allowed* to be pinned indefinitely (longterm).
   * Rather confusingly, all folios in the range are required to be pinned via
@@ -2421,16 +2471,13 @@ err:
  static long check_and_migrate_movable_folios(unsigned long nr_folios,
  					     struct folio **folios)
  {
-	unsigned long collected;
-	LIST_HEAD(movable_folio_list);
+	struct pages_or_folios pofs = {
+		.folios = folios,
+		.has_folios = true,
+		.nr_entries = nr_folios,
+	};
- collected = collect_longterm_unpinnable_folios(&movable_folio_list,
-						       nr_folios, folios);
-	if (!collected)
-		return 0;
-
-	return migrate_longterm_unpinnable_folios(&movable_folio_list,
-						  nr_folios, folios);
+	return check_and_migrate_movable_pages_or_folios(&pofs);
  }
/*
@@ -2442,20 +2489,13 @@ static long check_and_migrate_movable_fo
  static long check_and_migrate_movable_pages(unsigned long nr_pages,
  					    struct page **pages)
  {
-	struct folio **folios;
-	long i, ret;
+	struct pages_or_folios pofs = {
+		.pages = pages,
+		.has_folios = false,
+		.nr_entries = nr_pages,
+	};
- folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL);
-	if (!folios)
-		return -ENOMEM;
-
-	for (i = 0; i < nr_pages; i++)
-		folios[i] = page_folio(pages[i]);
-
-	ret = check_and_migrate_movable_folios(nr_pages, folios);
-
-	kfree(folios);
-	return ret;
+	return check_and_migrate_movable_pages_or_folios(&pofs);
  }
  #else
  static long check_and_migrate_movable_pages(unsigned long nr_pages,


Patches currently in stable-queue which might be from jhubbard@xxxxxxxxxx are

queue-6.11/mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch

thanks,
--
John Hubbard





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux