Dear All, The original mail with this patch is not available in lore, so I decided to reply this one. On 03.10.2024 00:44, Andrew Morton wrote: > The patch titled > Subject: mm/mremap: prevent racing change of old pmd type > has been added to the -mm mm-hotfixes-unstable branch. Its filename is > mm-mremap-prevent-racing-change-of-old-pmd-type.patch > > This patch will shortly appear at > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-mremap-prevent-racing-change-of-old-pmd-type.patch > > This patch will later appear in the mm-hotfixes-unstable branch at > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > The -mm tree is included into linux-next via the mm-everything > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > and is updated there every 2-3 working days > > ------------------------------------------------------ > From: Jann Horn <jannh@xxxxxxxxxx> > Subject: mm/mremap: prevent racing change of old pmd type > Date: Wed, 02 Oct 2024 23:07:06 +0200 > > Prevent move_normal_pmd() in mremap() from racing with > retract_page_tables() in MADVISE_COLLAPSE such that > > pmd_populate(mm, new_pmd, pmd_pgtable(pmd)) > > operates on an empty source pmd, causing creation of a new pmd which maps > physical address 0 as a page table. > > This bug is only reachable if either CONFIG_READ_ONLY_THP_FOR_FS is set or > THP shmem is usable. (Unprivileged namespaces can be used to set up a > tmpfs that can contain THP shmem pages with "huge=advise".) > > If userspace triggers this bug *in multiple processes*, this could likely > be used to create stale TLB entries pointing to freed pages or cause > kernel UAF by breaking an invariant the rmap code relies on. > > Fix it by moving the rmap locking up so that it covers the span from > reading the PMD entry to moving the page table. > > Link: https://lkml.kernel.org/r/20241002-move_normal_pmd-vs-collapse-fix-v1-1-78290e5dece6@xxxxxxxxxx > Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock") > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> This patch landed in today's linux-next as commit 46c1b3279220 ("mm/mremap: prevent racing change of old pmd type"). In my tests I found that it introduces a lockdep warning about possible circular locking dependency on ARM64 machines. Reverting $subject together with commits a2fbe16f45a8 ("mm: mremap: move_ptes() use pte_offset_map_rw_nolock()") and 46c1b3279220 ("mm/mremap: prevent racing change of old pmd type") on top of next-20241004 fixes this problem. Here is the observed lockdep warning: Freeing unused kernel memory: 13824K Run /sbin/init as init process ====================================================== WARNING: possible circular locking dependency detected 6.12.0-rc1+ #15391 Not tainted ------------------------------------------------------ init/1 is trying to acquire lock: ffff000006943588 (&anon_vma->rwsem){+.+.}-{3:3}, at: vma_prepare+0x70/0x158 but task is already holding lock: ffff0000048c9970 (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: vma_prepare+0x28/0x158 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mapping->i_mmap_rwsem){+.+.}-{3:3}: down_write+0x50/0xe8 dma_resv_lockdep+0x140/0x300 do_one_initcall+0x68/0x300 kernel_init_freeable+0x28c/0x50c kernel_init+0x20/0x1d8 ret_from_fork+0x10/0x20 -> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0xd0/0xe4 __alloc_pages_noprof+0xe4/0x10d0 alloc_pages_mpol_noprof+0x88/0x23c alloc_pages_noprof+0x48/0xc0 __pud_alloc+0x44/0x254 alloc_new_pud.constprop.0+0x154/0x160 move_page_tables+0x1b0/0xc38 relocate_vma_down+0xe4/0x1f8 setup_arg_pages+0x190/0x370 load_elf_binary+0x370/0x15c4 bprm_execve+0x290/0x7a0 kernel_execve+0xf8/0x16c run_init_process+0xa8/0xbc kernel_init+0xec/0x1d8 ret_from_fork+0x10/0x20 -> #0 (&anon_vma->rwsem){+.+.}-{3:3}: __lock_acquire+0x1374/0x2224 lock_acquire+0x200/0x340 down_write+0x50/0xe8 vma_prepare+0x70/0x158 __split_vma+0x26c/0x388 vma_modify+0x45c/0x7f4 vma_modify_flags+0x90/0xc4 mprotect_fixup+0x8c/0x2c0 do_mprotect_pkey+0x2a8/0x464 __arm64_sys_mprotect+0x20/0x30 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe8 do_el0_svc_compat+0x20/0x3c el0_svc_compat+0x44/0xe0 el0t_32_sync_handler+0x98/0x148 el0t_32_sync+0x194/0x198 other info that might help us debug this: Chain exists of: &anon_vma->rwsem --> fs_reclaim --> &mapping->i_mmap_rwsem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mapping->i_mmap_rwsem); lock(fs_reclaim); lock(&mapping->i_mmap_rwsem); lock(&anon_vma->rwsem); *** DEADLOCK *** 2 locks held by init/1: #0: ffff000006998188 (&mm->mmap_lock){++++}-{3:3}, at: do_mprotect_pkey+0xb4/0x464 #1: ffff0000048c9970 (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: vma_prepare+0x28/0x158 stack backtrace: CPU: 1 UID: 0 PID: 1 Comm: init Not tainted 6.12.0-rc1+ #15391 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 print_circular_bug+0x298/0x37c check_noncircular+0x15c/0x170 __lock_acquire+0x1374/0x2224 lock_acquire+0x200/0x340 down_write+0x50/0xe8 vma_prepare+0x70/0x158 __split_vma+0x26c/0x388 vma_modify+0x45c/0x7f4 vma_modify_flags+0x90/0xc4 mprotect_fixup+0x8c/0x2c0 do_mprotect_pkey+0x2a8/0x464 __arm64_sys_mprotect+0x20/0x30 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe8 do_el0_svc_compat+0x20/0x3c el0_svc_compat+0x44/0xe0 el0t_32_sync_handler+0x98/0x148 el0t_32_sync+0x194/0x198 INIT: version 2.88 booting > ... Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland