Re: [v2 PATCH] mm: vma: skip anonymous vma when inserting vma to file rmap tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 12, 2025 at 08:04:23PM -0700, Yang Shi wrote:
>
>
> On 3/12/25 4:55 PM, Vasily Gorbik wrote:
> > On Wed, Mar 12, 2025 at 03:15:21PM -0700, Yang Shi wrote:
> > > LKP reported 800% performance improvement for small-allocs benchmark
> > > from vm-scalability [1] with patch ("/dev/zero: make private mapping
> > > full anonymous mapping") [2], but the patch was nack'ed since it changes
> > > the output of smaps somewhat.
> > ...
> > > ---
> > > v2:
> > >     * Added the comments in code suggested by Lorenzo
> > >     * Collected R-b from Lorenze
> > >
> > >   mm/vma.c | 18 ++++++++++++++++--
> > >   1 file changed, 16 insertions(+), 2 deletions(-)
> > Hi Yang,
> >
> > Replying to v2, as the code is the same as v1 in linux-next:
> >
> > The LTP test "mmap10" consistently triggers a kernel NULL pointer
> > dereference with this change, at least on x86 and s390. Reverting just
> > this single patch from linux-next fixes the issue.
>
> Hi Vasily,
>
> Thanks for the report. It is because dup_mmap() inserts the VMA into file
> rmap by checking whether vma->vm_file is NULL or not. This splat can be
> killed by skipping anonymous vma, but this actually will expose a more
> severe problem. The struct file refcount may be imbalance. The refcount is
> inc'ed in mmap, then inc'ed again by fork(), it is dec'ed when unmap or
> process exit. If we skip refcount inc in fork, we need skip refcount dec in
> unmap too, but there is still one refcount from mmap.
>
> Can we dec refcount in mmap if we see it is anonymous vma finally?
> Unfortunately, no. If the refcount reaches 0, the struct file will be freed.
> We will run into UAF when looking up smaps IIUC. It may point to anything.
>
> Lorenzo,
>
> This problem seems more complicated than what I thought in the first place.
> Making it is a real anonymous vma (vm_file is NULL) may be still the best
> option. But we need figure out how we can keep compatible smaps.

Ugh lord. I am not in favour of this for reasons aforementioned, and I _really_
don't want to special case this any more than we already do...

Let me think a bit about this also.

Maybe if you're at LSF we can chat about it there?

Thanks!

>
> Andrew,
>
> Can you please drop this patch from your tree?
>
> Thanks,
> Yang
>
> >
> > LTP: starting mmap10
> > BUG: kernel NULL pointer dereference, address: 0000000000000008
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 800000010d22a067 P4D 800000010d22a067 PUD 11ff09067 PMD 0
> > Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 5 UID: 0 PID: 1719 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #3
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
> > RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
> > Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
> > RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
> > RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
> > RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
> > RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
> > R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
> > R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
> > FS:  00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
> > Call Trace:
> >   <TASK>
> >   ? __die_body.cold+0x19/0x2b
> >   ? page_fault_oops+0xc4/0x1f0
> >   ? search_extable+0x26/0x30
> >   ? search_module_extables+0x3f/0x60
> >   ? exc_page_fault+0x6b/0x150
> >   ? asm_exc_page_fault+0x26/0x30
> >   ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
> >   ? __pfx_vma_interval_tree_augment_rotate+0x10/0x10
> >   ? __rb_insert_augmented+0x2b/0x1d0
> >   copy_mm+0x48a/0x8c0
> >   copy_process+0xf98/0x1930
> >   kernel_clone+0xb7/0x3b0
> >   __do_sys_clone+0x65/0x90
> >   do_syscall_64+0x9e/0x1a0
> >   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7ff643eb2b00
> > Code: 31 c0 31 d2 31 f6 bf 11 00 20 01 48 89 e5 53 48 83 ec 08 64 48 8b 04 25 10 00 00 00 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 89 c3 85 c0 75 31 64 48 8b 04 25 10 00 00
> > RSP: 002b:00007ffdac219010 EFLAGS: 00000202 ORIG_RAX: 0000000000000038
> > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff643eb2b00
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> > RBP: 00007ffdac219020 R08: 0000000000000000 R09: 0000000000000000
> > R10: 00007ff643df1a10 R11: 0000000000000202 R12: 0000000000000001
> > R13: 0000000000000000 R14: 00007ff644036000 R15: 0000000000000000
> >   </TASK>
> > Modules linked in:
> > CR2: 0000000000000008
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:__rb_insert_augmented+0x2b/0x1d0
> > Code: 0f 1e fa 48 89 f8 48 8b 3f 48 85 ff 0f 84 a4 01 00 00 41 55 49 89 f5 41 54 49 89 d4 55 53 48 8b 1f f6 c3 01 0f 85 e1 00 00 00 <48> 8b 53 08 48 39 fa 74 67 48 85 d2 74 09 f6 02 01 0f 84 a0 00 00
> > RSP: 0018:ffffc90002b47cc8 EFLAGS: 00010246
> > RAX: ffff8881143ab788 RBX: 0000000000000000 RCX: 00000000000009ff
> > RDX: ffffffff814ad5d0 RSI: ffff888100bb5060 RDI: ffff8881143ab088
> > RBP: ffff8881053af8c0 R08: ffff8881143ab700 R09: 00007ff6433f2000
> > R10: 00007ff6433f2000 R11: ffff8881143ab000 R12: ffffffff814ad5d0
> > R13: ffff888100bb5060 R14: ffff8881143ab700 R15: ffff8881143ab000
> > FS:  00007ff643df1740(0000) GS:ffff8882b45bf000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 000000011b042000 CR4: 00000000000006f0
> >
> >
> >
> > LTP: starting mmap10
> > Unable to handle kernel pointer dereference in virtual kernel address space
> > Failing address: 0000000000000000 TEID: 0000000000000483
> > Fault in home space mode while using kernel ASCE.
> > AS:000000000247c007 R3:00000001ffffc007 S:00000001ffffb801 P:000000000000013d
> > Oops: 0004 ilc:3 [#1] SMP
> > Modules linked in:
> > CPU: 0 UID: 0 PID: 665 Comm: mmap10 Not tainted 6.14.0-rc6-next-20250312 #16
> > Hardware name: IBM 3931 A01 704 (KVM/Linux)
> > Krnl PSW : 0704c00180000000 000003ffe0ee0440 (__rb_insert_augmented+0x60/0x210)
> >             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > Krnl GPRS: 00000000009ff000 0000000000000000 000000008e5f7508 0000000084a7ed08
> >             00000000000009fe 0000000000000000 0000000000000000 0000037fe06c7b68
> >             00000000801d0e90 000003ffe04158d0 0000000084a7ed08 0000000000000000
> >             000003ffbb700000 00000000801d0e48 000003ffe0ee057c 0000037fe06c7a40
> > Krnl Code: 000003ffe0ee0430: e31030080004        lg      %r1,8(%r3)
> >             000003ffe0ee0436: ec1200888064        cgrj    %r1,%r2,8,000003ffe0ee0546
> >            #000003ffe0ee043c: b90400a3            lgr     %r10,%r3
> >            >000003ffe0ee0440: e310b0100024        stg     %r1,16(%r11)
> >             000003ffe0ee0446: e3b030080024        stg     %r11,8(%r3)
> >             000003ffe0ee044c: ec180009007c        cgij    %r1,0,8,000003ffe0ee045e
> >             000003ffe0ee0452: ec2b000100d9        aghik   %r2,%r11,1
> >             000003ffe0ee0458: e32010000024        stg     %r2,0(%r1)
> > Call Trace:
> >   [<000003ffe0ee0440>] __rb_insert_augmented+0x60/0x210
> >   [<000003ffe016d6c4>] dup_mmap+0x424/0x8c0
> >   [<000003ffe016dc62>] copy_mm+0x102/0x1c0
> >   [<000003ffe016e8ae>] copy_process+0x7ce/0x12b0
> >   [<000003ffe016f458>] kernel_clone+0x68/0x380
> >   [<000003ffe016f84a>] __do_sys_clone+0x5a/0x70
> >   [<000003ffe016faa0>] __s390x_sys_clone+0x40/0x50
> >   [<000003ffe011c9b6>] do_syscall.constprop.0+0x116/0x140
> >   [<000003ffe0ef1d64>] __do_syscall+0xd4/0x1c0
> >   [<000003ffe0efd044>] system_call+0x74/0x98
> > Last Breaking-Event-Address:
> >   [<000003ffe0ee058a>] __rb_insert_augmented+0x1aa/0x210
> > Kernel panic - not syncing: Fatal exception: panic_on_oops
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux