Re: + mm-mremap-prevent-racing-change-of-old-pmd-type.patch added to mm-hotfixes-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear All,

The original mail with this patch is not available in lore, so I decided 
to reply this one.

On 03.10.2024 00:44, Andrew Morton wrote:
> The patch titled
>       Subject: mm/mremap: prevent racing change of old pmd type
> has been added to the -mm mm-hotfixes-unstable branch.  Its filename is
>       mm-mremap-prevent-racing-change-of-old-pmd-type.patch
>
> This patch will shortly appear at
>       https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-mremap-prevent-racing-change-of-old-pmd-type.patch
>
> This patch will later appear in the mm-hotfixes-unstable branch at
>      git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>
> Before you just go and hit "reply", please:
>     a) Consider who else should be cc'ed
>     b) Prefer to cc a suitable mailing list as well
>     c) Ideally: find the original patch on the mailing list and do a
>        reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
>
> The -mm tree is included into linux-next via the mm-everything
> branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> and is updated there every 2-3 working days
>
> ------------------------------------------------------
> From: Jann Horn <jannh@xxxxxxxxxx>
> Subject: mm/mremap: prevent racing change of old pmd type
> Date: Wed, 02 Oct 2024 23:07:06 +0200
>
> Prevent move_normal_pmd() in mremap() from racing with
> retract_page_tables() in MADVISE_COLLAPSE such that
>
>      pmd_populate(mm, new_pmd, pmd_pgtable(pmd))
>
> operates on an empty source pmd, causing creation of a new pmd which maps
> physical address 0 as a page table.
>
> This bug is only reachable if either CONFIG_READ_ONLY_THP_FOR_FS is set or
> THP shmem is usable.  (Unprivileged namespaces can be used to set up a
> tmpfs that can contain THP shmem pages with "huge=advise".)
>
> If userspace triggers this bug *in multiple processes*, this could likely
> be used to create stale TLB entries pointing to freed pages or cause
> kernel UAF by breaking an invariant the rmap code relies on.
>
> Fix it by moving the rmap locking up so that it covers the span from
> reading the PMD entry to moving the page table.
>
> Link: https://lkml.kernel.org/r/20241002-move_normal_pmd-vs-collapse-fix-v1-1-78290e5dece6@xxxxxxxxxx
> Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
> Signed-off-by: Jann Horn <jannh@xxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

This patch landed in today's linux-next as commit 46c1b3279220 
("mm/mremap: prevent racing change of old pmd type"). In my tests I 
found that it introduces a lockdep warning about possible circular 
locking dependency on ARM64 machines. Reverting $subject together with 
commits a2fbe16f45a8 ("mm: mremap: move_ptes() use 
pte_offset_map_rw_nolock()") and 46c1b3279220 ("mm/mremap: prevent 
racing change of old pmd type") on top of next-20241004 fixes this problem.

Here is the observed lockdep warning:

Freeing unused kernel memory: 13824K
Run /sbin/init as init process

======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc1+ #15391 Not tainted
------------------------------------------------------
init/1 is trying to acquire lock:
ffff000006943588 (&anon_vma->rwsem){+.+.}-{3:3}, at: vma_prepare+0x70/0x158

but task is already holding lock:
ffff0000048c9970 (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: 
vma_prepare+0x28/0x158

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&mapping->i_mmap_rwsem){+.+.}-{3:3}:
        down_write+0x50/0xe8
        dma_resv_lockdep+0x140/0x300
        do_one_initcall+0x68/0x300
        kernel_init_freeable+0x28c/0x50c
        kernel_init+0x20/0x1d8
        ret_from_fork+0x10/0x20

-> #1 (fs_reclaim){+.+.}-{0:0}:
        fs_reclaim_acquire+0xd0/0xe4
        __alloc_pages_noprof+0xe4/0x10d0
        alloc_pages_mpol_noprof+0x88/0x23c
        alloc_pages_noprof+0x48/0xc0
        __pud_alloc+0x44/0x254
        alloc_new_pud.constprop.0+0x154/0x160
        move_page_tables+0x1b0/0xc38
        relocate_vma_down+0xe4/0x1f8
        setup_arg_pages+0x190/0x370
        load_elf_binary+0x370/0x15c4
        bprm_execve+0x290/0x7a0
        kernel_execve+0xf8/0x16c
        run_init_process+0xa8/0xbc
        kernel_init+0xec/0x1d8
        ret_from_fork+0x10/0x20

-> #0 (&anon_vma->rwsem){+.+.}-{3:3}:
        __lock_acquire+0x1374/0x2224
        lock_acquire+0x200/0x340
        down_write+0x50/0xe8
        vma_prepare+0x70/0x158
        __split_vma+0x26c/0x388
        vma_modify+0x45c/0x7f4
        vma_modify_flags+0x90/0xc4
        mprotect_fixup+0x8c/0x2c0
        do_mprotect_pkey+0x2a8/0x464
        __arm64_sys_mprotect+0x20/0x30
        invoke_syscall+0x48/0x110
        el0_svc_common.constprop.0+0x40/0xe8
        do_el0_svc_compat+0x20/0x3c
        el0_svc_compat+0x44/0xe0
        el0t_32_sync_handler+0x98/0x148
        el0t_32_sync+0x194/0x198

other info that might help us debug this:

Chain exists of:
   &anon_vma->rwsem --> fs_reclaim --> &mapping->i_mmap_rwsem

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&mapping->i_mmap_rwsem);
                                lock(fs_reclaim);
lock(&mapping->i_mmap_rwsem);
   lock(&anon_vma->rwsem);

  *** DEADLOCK ***

2 locks held by init/1:
  #0: ffff000006998188 (&mm->mmap_lock){++++}-{3:3}, at: 
do_mprotect_pkey+0xb4/0x464
  #1: ffff0000048c9970 (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: 
vma_prepare+0x28/0x158

stack backtrace:
CPU: 1 UID: 0 PID: 1 Comm: init Not tainted 6.12.0-rc1+ #15391
Hardware name: linux,dummy-virt (DT)
Call trace:
  dump_backtrace+0x94/0xec
  show_stack+0x18/0x24
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  print_circular_bug+0x298/0x37c
  check_noncircular+0x15c/0x170
  __lock_acquire+0x1374/0x2224
  lock_acquire+0x200/0x340
  down_write+0x50/0xe8
  vma_prepare+0x70/0x158
  __split_vma+0x26c/0x388
  vma_modify+0x45c/0x7f4
  vma_modify_flags+0x90/0xc4
  mprotect_fixup+0x8c/0x2c0
  do_mprotect_pkey+0x2a8/0x464
  __arm64_sys_mprotect+0x20/0x30
  invoke_syscall+0x48/0x110
  el0_svc_common.constprop.0+0x40/0xe8
  do_el0_svc_compat+0x20/0x3c
  el0_svc_compat+0x44/0xe0
  el0t_32_sync_handler+0x98/0x148
  el0t_32_sync+0x194/0x198
INIT: version 2.88 booting

> ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux