Re: [PATCH v2] vma remove the unneeded avc bound with non-CoWed folio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/8/28 1:35, David Hildenbrand 写道:
On 27.08.24 03:50, zhiguojiang wrote:


在 2024/8/27 1:24, David Hildenbrand 写道:
On 23.08.24 16:01, Zhiguo Jiang wrote:
After CoWed by do_wp_page, the vma established a new mapping
relationship
with the CoWed folio instead of the non-CoWed folio. However, regarding the situation where vma->anon_vma and the non-CoWed folio's anon_vma are
not same, the avc binding relationship between them will no longer be
needed, so it is issue for the avc binding relationship still existing
between them.

This patch will remove the avc binding relationship between vma and the
non-CoWed folio's anon_vma, which each has their own independent
anon_vma. It can also alleviates rmap overhead simultaneously.

Signed-off-by: Zhiguo Jiang <justinjiang@xxxxxxxx>
---
-v2:
   * Solve the kernel test robot noticed "WARNING"
     Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
     Closes:
https://lore.kernel.org/oe-lkp/202408230938.43f55b4-lkp@xxxxxxxxx
   * Update comments to more accurately describe this patch.

-v1:
https://lore.kernel.org/linux-mm/20240820143359.199-1-justinjiang@xxxxxxxx/

   include/linux/rmap.h |  1 +
   mm/memory.c          |  8 +++++++
   mm/rmap.c            | 53 ++++++++++++++++++++++++++++++++++++++++++++
   3 files changed, 62 insertions(+)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 91b5935e8485..8607d28a3146
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -257,6 +257,7 @@ void folio_remove_rmap_ptes(struct folio *,
struct page *, int nr_pages,
       folio_remove_rmap_ptes(folio, page, 1, vma)
   void folio_remove_rmap_pmd(struct folio *, struct page *,
           struct vm_area_struct *);
+void folio_remove_anon_avc(struct folio *, struct vm_area_struct *);
     void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
           unsigned long address, rmap_t flags);
diff --git a/mm/memory.c b/mm/memory.c
index 93c0c25433d0..4c89cb1cb73e
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3428,6 +3428,14 @@ static vm_fault_t wp_page_copy(struct vm_fault
*vmf)
                * old page will be flushed before it can be reused.
                */
               folio_remove_rmap_pte(old_folio, vmf->page, vma);
+
+            /*
+             * If the new_folio's anon_vma is different from the
+             * old_folio's anon_vma, the avc binding relationship
+             * between vma and the old_folio's anon_vma is removed,
+             * avoiding rmap redundant overhead.
+             */
+            folio_remove_anon_avc(old_folio, vma);

... by increasing write fault latency, introducing an RMAP walk (!)? Hmm?

On the reuse path, we do a folio_move_anon_rmap(), to optimize that.

Thanks for your comments. This may not be a good fixup patch. The
resue patch folio_move_anon_rmap() seems to be exclusive or
_refcount = 1 folios. The fork() path seems to clear exclusive flag
in copy_page_range() --> ... --> __folio_try_dup_anon_rmap(). However,
I observed lots of orphan avcs by the above debug trace logs in
wp_page_copy(). But they may be not removed by discussing with Mika.

Was this patch ever tested? I cannot even boot a simple VM without an endless stream of

[    5.804598] ------------[ cut here ]------------
[    5.805494] WARNING: CPU: 11 PID: 595 at mm/rmap.c:443 unlink_anon_vmas+0x19b/0x1d0
[    5.806962] Modules linked in: qemu_fw_cfg
[    5.807762] CPU: 11 UID: 0 PID: 595 Comm: dracut-rootfs-g Tainted: G        W          6.11.0-rc4+ #72
[    5.809546] Tainted: [W]=WARN
[    5.810127] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[    5.811753] RIP: 0010:unlink_anon_vmas+0x19b/0x1d0
[    5.812680] Code: b0 00 00 00 00 75 1f f0 ff 8f a0 00 00 00 75 a2 e8 8a fd ff ff eb 9b 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d4 82 d0 00 0f 0b eb dd <0f> 0b eb cf 0f 0b 48 83 c7 08 e8 16 40 d7 ff e9 ea fe ff ff 48 8b
[    5.816247] RSP: 0018:ffffa19f43bb78d0 EFLAGS: 00010286
[    5.817258] RAX: ffff8a71c1bdd2d0 RBX: ffff8a71c1bdd2c0 RCX: ffff8a71c27a86c8 [    5.818624] RDX: 0000000000000001 RSI: ffff8a71c2771b28 RDI: ffff8a71c27a9e60 [    5.820011] RBP: dead000000000122 R08: 0000000000000000 R09: 0000000000000001 [    5.821380] R10: 0000000000000200 R11: 0000000000000001 R12: ffff8a71c2771b28 [    5.822748] R13: dead000000000100 R14: ffff8a71c2771b18 R15: ffff8a71c27a9e60 [    5.824122] FS:  0000000000000000(0000) GS:ffff8a7337980000(0000) knlGS:0000000000000000
[    5.825665] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.826775] CR2: 00007fca7f70ac58 CR3: 00000001027b2004 CR4: 0000000000770ef0
[    5.828146] PKRU: 55555554
[    5.828686] Call Trace:
[    5.829169]  <TASK>
[    5.829594]  ? __warn.cold+0xb1/0x13e
[    5.830312]  ? unlink_anon_vmas+0x19b/0x1d0
[    5.831118]  ? report_bug+0xff/0x140
[    5.831840]  ? handle_bug+0x3c/0x80
[    5.832524]  ? exc_invalid_op+0x17/0x70
[    5.833262]  ? asm_exc_invalid_op+0x1a/0x20
[    5.834086]  ? unlink_anon_vmas+0x19b/0x1d0
[    5.834908]  free_pgtables+0x130/0x290
[    5.835661]  exit_mmap+0x19a/0x460
[    5.836351]  __mmput+0x4b/0x120
[    5.836965]  do_exit+0x2e1/0xac0
[    5.837601]  ? lock_release+0xd5/0x2c0
[    5.838343]  do_group_exit+0x36/0xa0
[    5.839035]  __x64_sys_exit_group+0x18/0x20
[    5.839866]  x64_sys_call+0x14b4/0x14c0
Arm64 machine tested it and no crashes detected. You may try the
attachment modifition provided by Lorenzo Stoakes. Can you please
check if there are any opportunities for further improvement?


Andrew, please remove this from mm-unstable.

Thanks
Zhiguo
From 973ce5f0aea78196088cd527905cc0fad40edb29 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
Date: Sat, 24 Aug 2024 18:55:31 +0100
Subject: [RFC PATCH] mm: fixup orphan avc cleanup logic

Existing logic failed to reparent the anon_vma whose avc was removed which
resulted in assertion failures.

This patch corrects this, fixes up some comments, and does some other
cleanups.

We also do not do anything relating to anon_vma->parent manipulation if no
orphaned AVC is found.

I still feel this logic is highly dubious, but this does fix the issue with
anon_vma->num_children accounting.

This doesn't correctly handle locking of the reparented anon_vma.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
---
 include/linux/rmap.h |   2 +-
 mm/memory.c          |   2 +-
 mm/rmap.c            | 101 ++++++++++++++++++++++++++++---------------
 3 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8607d28a3146..f1a835f54064 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -257,7 +257,7 @@ void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
 	folio_remove_rmap_ptes(folio, page, 1, vma)
 void folio_remove_rmap_pmd(struct folio *, struct page *,
 		struct vm_area_struct *);
-void folio_remove_anon_avc(struct folio *, struct vm_area_struct *);
+void cleanup_orphan_avc(struct folio *, struct vm_area_struct *);
 
 void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
 		unsigned long address, rmap_t flags);
diff --git a/mm/memory.c b/mm/memory.c
index 4c89cb1cb73e..989b078dd860 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3435,7 +3435,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 			 * between vma and the old_folio's anon_vma is removed,
 			 * avoiding rmap redundant overhead.
 			 */
-			folio_remove_anon_avc(old_folio, vma);
+			cleanup_orphan_avc(old_folio, vma);
 		}
 
 		/* Free the old page.. */
diff --git a/mm/rmap.c b/mm/rmap.c
index 56fc16fcf2a9..3ac264962917 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1523,56 +1523,87 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
 }
 
 /**
- * folio_remove_anon_avc - remove the avc binding relationship between
- * folio and vma with different anon_vmas.
- * @folio:	The folio with anon_vma to remove the binded avc from
- * @vma:	The vm area to remove the binded avc with folio's anon_vma
+ * cleanup_orphan_avc - remove the avc binding relationship between a parent
+ * folio and child vma with different anon_vmas which, due to an operation such
+ * as CoW'ing a folio, is no longer meaningful.
  *
- * The caller is currently used for CoWed scene.
+ * (insert ASCII diagrams and explanation here...)
+ *
+ * @old_folio:  The folio which contains the parent anon_vma which has an unneeded
+ *              avc binding.
+ * @new_vma:	The VMA which is unnecessarily bound to folio.
  */
-void folio_remove_anon_avc(struct folio *folio,
-		struct vm_area_struct *vma)
+void cleanup_orphan_avc(struct folio *old_folio, struct vm_area_struct *new_vma)
 {
-	struct anon_vma *anon_vma = folio_anon_vma(folio);
+	struct anon_vma *parent_anon_vma = folio_anon_vma(old_folio);
+	struct anon_vma *child_anon_vma = new_vma->anon_vma;
 	pgoff_t pgoff_start, pgoff_end;
 	struct anon_vma_chain *avc;
+	bool removed = false;
 
 	/*
-	 * Ensure that the vma's anon_vma and the folio's
-	 * anon_vma exist and are not same.
+	 * If this folio were not anonymous, folio_anon_vma() would have
+	 * returned NULL. Equally, if the parent and child anon_vma objects are
+	 * the same, then we have nothing to do here.
 	 */
-	if (!folio_test_anon(folio) || unlikely(!anon_vma) ||
-	    anon_vma == vma->anon_vma)
+	if (!parent_anon_vma || parent_anon_vma == child_anon_vma)
 		return;
 
-	pgoff_start = folio_pgoff(folio);
-	pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
+	pgoff_start = folio_pgoff(old_folio);
+	pgoff_end = pgoff_start + folio_nr_pages(old_folio) - 1;
 
-	if (!anon_vma_trylock_write(anon_vma))
+	/* This is an optimistic attempt. */
+	if (!anon_vma_trylock_write(parent_anon_vma))
 		return;
 
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
-			pgoff_start, pgoff_end) {
-		/*
-		 * Find the avc associated with vma from the folio's
-		 * anon_vma and remove it.
-		 */
-		if (avc->vma == vma) {
-			anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
-			/*
-			 * When removing the avc with anon_vma that is
-			 * different from the parent anon_vma from parent
-			 * anon_vma->rb_root, the parent num_children
-			 * count value is needed to reduce one.
-			 */
-			anon_vma->num_children--;
+	/*
+	 * Iterate through all AVC's tied to the old folio, looking for the
+	 * redundant one pointing at the new VMA.
+	 */
+	anon_vma_interval_tree_foreach(avc, &parent_anon_vma->rb_root,
+				       pgoff_start, pgoff_end) {
+		if (avc->vma != new_vma)
+			continue;
 
-			list_del(&avc->same_vma);
-			anon_vma_chain_free(avc);
-			break;
-		}
+		/* Remove the unneeded avc. */
+		anon_vma_interval_tree_remove(avc, &parent_anon_vma->rb_root);
+		list_del(&avc->same_vma);
+		anon_vma_chain_free(avc);
+
+		removed = true;
+		break;
 	}
-	anon_vma_unlock_write(anon_vma);
+
+	if (!removed)
+		goto unlock;
+
+	/*
+	 * Removing an avc implies that the associated avc MAY no longer need
+	 * to point to its parent, and we need to reparent it.
+	 */
+
+	/*
+	 * If somehow we aren't already the child of the parent anon_vma, we
+	 * have nothing to do here.
+	 */
+	if (child_anon_vma->parent != parent_anon_vma)
+		goto unlock;
+
+	/* OK, we abandon our parent, and reparent to ourselves. */
+
+	parent_anon_vma->num_children--;
+
+	child_anon_vma->parent = child_anon_vma;
+	child_anon_vma->num_children++;
+
+	/*
+	 * Here we should probably reset the anon_vma->root, as per
+	 * anon_vma_ctor() but this feels icky and horrible. Bit weird to share
+	 * a lock with the old parent's root.
+	 */
+
+unlock:
+	anon_vma_unlock_write(parent_anon_vma);
 }
 
 static __always_inline void __folio_remove_rmap(struct folio *folio,
-- 
2.46.0


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux