On 27.08.24 03:50, zhiguojiang wrote:
在 2024/8/27 1:24, David Hildenbrand 写道:
On 23.08.24 16:01, Zhiguo Jiang wrote:
After CoWed by do_wp_page, the vma established a new mapping
relationship
with the CoWed folio instead of the non-CoWed folio. However, regarding
the situation where vma->anon_vma and the non-CoWed folio's anon_vma are
not same, the avc binding relationship between them will no longer be
needed, so it is issue for the avc binding relationship still existing
between them.
This patch will remove the avc binding relationship between vma and the
non-CoWed folio's anon_vma, which each has their own independent
anon_vma. It can also alleviates rmap overhead simultaneously.
Signed-off-by: Zhiguo Jiang <justinjiang@xxxxxxxx>
---
-v2:
* Solve the kernel test robot noticed "WARNING"
Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
Closes:
https://lore.kernel.org/oe-lkp/202408230938.43f55b4-lkp@xxxxxxxxx
* Update comments to more accurately describe this patch.
-v1:
https://lore.kernel.org/linux-mm/20240820143359.199-1-justinjiang@xxxxxxxx/
include/linux/rmap.h | 1 +
mm/memory.c | 8 +++++++
mm/rmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 62 insertions(+)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 91b5935e8485..8607d28a3146
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -257,6 +257,7 @@ void folio_remove_rmap_ptes(struct folio *,
struct page *, int nr_pages,
folio_remove_rmap_ptes(folio, page, 1, vma)
void folio_remove_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
+void folio_remove_anon_avc(struct folio *, struct vm_area_struct *);
void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
diff --git a/mm/memory.c b/mm/memory.c
index 93c0c25433d0..4c89cb1cb73e
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3428,6 +3428,14 @@ static vm_fault_t wp_page_copy(struct vm_fault
*vmf)
* old page will be flushed before it can be reused.
*/
folio_remove_rmap_pte(old_folio, vmf->page, vma);
+
+ /*
+ * If the new_folio's anon_vma is different from the
+ * old_folio's anon_vma, the avc binding relationship
+ * between vma and the old_folio's anon_vma is removed,
+ * avoiding rmap redundant overhead.
+ */
+ folio_remove_anon_avc(old_folio, vma);
... by increasing write fault latency, introducing an RMAP walk (!)? Hmm?
On the reuse path, we do a folio_move_anon_rmap(), to optimize that.
Thanks for your comments. This may not be a good fixup patch. The
resue patch folio_move_anon_rmap() seems to be exclusive or
_refcount = 1 folios. The fork() path seems to clear exclusive flag
in copy_page_range() --> ... --> __folio_try_dup_anon_rmap(). However,
I observed lots of orphan avcs by the above debug trace logs in
wp_page_copy(). But they may be not removed by discussing with Mika.
Was this patch ever tested? I cannot even boot a simple VM without an endless stream of
[ 5.804598] ------------[ cut here ]------------
[ 5.805494] WARNING: CPU: 11 PID: 595 at mm/rmap.c:443 unlink_anon_vmas+0x19b/0x1d0
[ 5.806962] Modules linked in: qemu_fw_cfg
[ 5.807762] CPU: 11 UID: 0 PID: 595 Comm: dracut-rootfs-g Tainted: G W 6.11.0-rc4+ #72
[ 5.809546] Tainted: [W]=WARN
[ 5.810127] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 5.811753] RIP: 0010:unlink_anon_vmas+0x19b/0x1d0
[ 5.812680] Code: b0 00 00 00 00 75 1f f0 ff 8f a0 00 00 00 75 a2 e8 8a fd ff ff eb 9b 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d4 82 d0 00 0f 0b eb dd <0f> 0b eb cf 0f 0b 48 83 c7 08 e8 16 40 d7 ff e9 ea fe ff ff 48 8b
[ 5.816247] RSP: 0018:ffffa19f43bb78d0 EFLAGS: 00010286
[ 5.817258] RAX: ffff8a71c1bdd2d0 RBX: ffff8a71c1bdd2c0 RCX: ffff8a71c27a86c8
[ 5.818624] RDX: 0000000000000001 RSI: ffff8a71c2771b28 RDI: ffff8a71c27a9e60
[ 5.820011] RBP: dead000000000122 R08: 0000000000000000 R09: 0000000000000001
[ 5.821380] R10: 0000000000000200 R11: 0000000000000001 R12: ffff8a71c2771b28
[ 5.822748] R13: dead000000000100 R14: ffff8a71c2771b18 R15: ffff8a71c27a9e60
[ 5.824122] FS: 0000000000000000(0000) GS:ffff8a7337980000(0000) knlGS:0000000000000000
[ 5.825665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.826775] CR2: 00007fca7f70ac58 CR3: 00000001027b2004 CR4: 0000000000770ef0
[ 5.828146] PKRU: 55555554
[ 5.828686] Call Trace:
[ 5.829169] <TASK>
[ 5.829594] ? __warn.cold+0xb1/0x13e
[ 5.830312] ? unlink_anon_vmas+0x19b/0x1d0
[ 5.831118] ? report_bug+0xff/0x140
[ 5.831840] ? handle_bug+0x3c/0x80
[ 5.832524] ? exc_invalid_op+0x17/0x70
[ 5.833262] ? asm_exc_invalid_op+0x1a/0x20
[ 5.834086] ? unlink_anon_vmas+0x19b/0x1d0
[ 5.834908] free_pgtables+0x130/0x290
[ 5.835661] exit_mmap+0x19a/0x460
[ 5.836351] __mmput+0x4b/0x120
[ 5.836965] do_exit+0x2e1/0xac0
[ 5.837601] ? lock_release+0xd5/0x2c0
[ 5.838343] do_group_exit+0x36/0xa0
[ 5.839035] __x64_sys_exit_group+0x18/0x20
[ 5.839866] x64_sys_call+0x14b4/0x14c0
Andrew, please remove this from mm-unstable.
--
Cheers,
David / dhildenb