On Thu, Mar 20, 2025 at 9:02 PM Pedro Falcato <pfalcato@xxxxxxx> wrote: > On Thu, Mar 20, 2025 at 12:09:36PM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: eb88e6bfbc0a Merge tag 'fsnotify_for_v6.14-rc7' of git://g.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=11e6c83f980000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=77423669c2b8fa9 > > dashboard link: https://syzkaller.appspot.com/bug?extid=20ed41006cf9d842c2b5 > > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > > userspace arch: i386 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > Downloadable assets: > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-eb88e6bf.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/ded0ce69669f/vmlinux-eb88e6bf.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/6e6fa3c719e7/bzImage-eb88e6bf.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+20ed41006cf9d842c2b5@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > </TASK> > > BUG: unable to handle page fault for address: fffffffffffffff4 > > #PF: supervisor read access in kernel mode > > #PF: error_code(0x0000) - not-present page > > PGD df84067 P4D df84067 PUD df86067 PMD 0 > > Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI > > CPU: 1 UID: 0 PID: 17805 Comm: syz.8.3237 Not tainted 6.14.0-rc6-syzkaller-00212-geb88e6bfbc0a #0 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > RIP: 0010:vma_merge_existing_range+0x266/0x2070 mm/vma.c:734 > > Code: e8 5f 25 ad ff 48 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 1c 19 00 00 48 8b 04 24 48 8b 74 24 08 <4c> 8b 38 4c 89 ff e8 9f 1f ad ff 48 8b 44 24 08 49 39 c7 0f 83 db > > RSP: 0000:ffffc9000319f988 EFLAGS: 00010246 > > RAX: fffffffffffffff4 RBX: ffffc9000319fae8 RCX: ffffffff820cd3e5 > > RDX: 1ffffffffffffffe RSI: 0000000080c2a000 RDI: 0000000000000005 > > RBP: 0000000080ce2000 R08: 0000000000000005 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001 > > R13: ffffc9000319fb08 R14: ffff888025eddc98 R15: ffff88804eec0a00 > > FS: 0000000000000000(0000) GS:ffff88802b500000(0063) knlGS:00000000f5106b40 > > CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > > CR2: fffffffffffffff4 CR3: 00000000614d6000 CR4: 0000000000352ef0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > <TASK> > > vma_modify.constprop.0+0x87/0x410 mm/vma.c:1517 > > vma_modify_flags_uffd+0x241/0x2e0 mm/vma.c:1598 > > userfaultfd_clear_vma+0x91/0x130 mm/userfaultfd.c:1906 > > userfaultfd_release_all+0x2ae/0x4c0 mm/userfaultfd.c:2024 > > userfaultfd_release+0xf4/0x1c0 fs/userfaultfd.c:865 > > __fput+0x3ff/0xb70 fs/file_table.c:464 > > task_work_run+0x14e/0x250 kernel/task_work.c:227 > > resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] > > exit_to_user_mode_loop kernel/entry/common.c:114 [inline] > > exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > > __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > > syscall_exit_to_user_mode+0x27b/0x2a0 kernel/entry/common.c:218 > > __do_fast_syscall_32+0x80/0x120 arch/x86/entry/common.c:390 > > do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:412 > > entry_SYSENTER_compat_after_hwframe+0x84/0x8e > > RIP: 0023:0xf7fe6579 > > Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 > > RSP: 002b:00000000f510655c EFLAGS: 00000296 ORIG_RAX: 0000000000000135 > > RAX: 0000000000000001 RBX: 0000000080000180 RCX: 0000000000000001 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > </TASK> > > Modules linked in: > > CR2: fffffffffffffff4 > > ---[ end trace 0000000000000000 ]--- > > RIP: 0010:vma_merge_existing_range+0x266/0x2070 mm/vma.c:734 > > Code: e8 5f 25 ad ff 48 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 1c 19 00 00 48 8b 04 24 48 8b 74 24 08 <4c> 8b 38 4c 89 ff e8 9f 1f ad ff 48 8b 44 24 08 49 39 c7 0f 83 db > > RSP: 0000:ffffc9000319f988 EFLAGS: 00010246 > > RAX: fffffffffffffff4 RBX: ffffc9000319fae8 RCX: ffffffff820cd3e5 > > RDX: 1ffffffffffffffe RSI: 0000000080c2a000 RDI: 0000000000000005 > > RBP: 0000000080ce2000 R08: 0000000000000005 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001 > > R13: ffffc9000319fb08 R14: ffff888025eddc98 R15: ffff88804eec0a00 > > FS: 0000000000000000(0000) GS:ffff88802b500000(0063) knlGS:00000000f5106b40 > > CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > > CR2: fffffffffffffff4 CR3: 00000000614d6000 CR4: 0000000000352ef0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > ---------------- > > Code disassembly (best guess): > > 0: e8 5f 25 ad ff call 0xffad2564 > > 5: 48 8b 14 24 mov (%rsp),%rdx > > 9: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > > 10: fc ff df > > 13: 48 c1 ea 03 shr $0x3,%rdx > > 17: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) > > 1b: 0f 85 1c 19 00 00 jne 0x193d > > 21: 48 8b 04 24 mov (%rsp),%rax > > 25: 48 8b 74 24 08 mov 0x8(%rsp),%rsi > > * 2a: 4c 8b 38 mov (%rax),%r15 <-- trapping instruction > > 2d: 4c 89 ff mov %r15,%rdi > > 30: e8 9f 1f ad ff call 0xffad1fd4 > > 35: 48 8b 44 24 08 mov 0x8(%rsp),%rax > > 3a: 49 39 c7 cmp %rax,%r15 > > 3d: 0f .byte 0xf > > 3e: 83 .byte 0x83 > > 3f: db .byte 0xdb > > Ahh, fun bug. This *seems* to be the bug: > > First, in vma_modify: > > merged = vma_merge_existing_range(vmg); > if (merged) > return merged; > if (vmg_nomem(vmg)) > return ERR_PTR(-ENOMEM); > > then, all the way up to userfaultfd_release_all (the return value propagates > vma_modify -> vma_modify_flags_uffd -> userfaultfd_clear_vma): > > prev = NULL; > for_each_vma(vmi, vma) { > cond_resched(); > BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ > !!(vma->vm_flags & __VM_UFFD_FLAGS)); > if (vma->vm_userfaultfd_ctx.ctx != ctx) { > prev = vma; > continue; > } > > vma = userfaultfd_clear_vma(&vmi, prev, vma, > vma->vm_start, vma->vm_end); > prev = vma; > } > > So, if uffd gets an IS_ERR(vma), it keeps going and takes that vma as the prev value, > which leads to that ERR_PTR(-ENOMEM) deref crash (-12 = -ENOMEM = 0xffffff4). > This situation is kind of awkward because ->release() errors don't mean a thing. > So, I have another idea (pasting for syzbot) which might just be cromulent. > Untested, but thoughts? > > #syz test > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index d06453fa8aba..fb835d82eb84 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -2023,6 +2023,8 @@ void userfaultfd_release_all(struct mm_struct *mm, > > vma = userfaultfd_clear_vma(&vmi, prev, vma, > vma->vm_start, vma->vm_end); > + if (WARN_ON(IS_ERR(vma))) > + break; If this WARN_ON() was ever actually hit, I think we'd leave dangling pointers in VMAs? As much as Linus hates BUG_ON(), I personally think that would be a situation warranting BUG_ON(), or at least CHECK_DATA_CORRUPTION(). That said: > prev = vma; > } > mmap_write_unlock(mm); > diff --git a/mm/vma.c b/mm/vma.c > index 71ca012c616c..b2167b7dc27d 100644 > --- a/mm/vma.c > +++ b/mm/vma.c > @@ -1517,8 +1517,16 @@ static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg) > merged = vma_merge_existing_range(vmg); > if (merged) > return merged; > - if (vmg_nomem(vmg)) > + if (vmg_nomem(vmg)) { > + /* If we can avoid failing the whole modification > + * due to a merge OOM and validly keep going > + * (we're modifying the whole VMA), return vma intact. > + * It won't get merged, but such is life - we're avoiding > + * OOM conditions in other parts of mm/ this way */ > + if (start <= vma->vm_start && end >= vma->vm_end) > + return vma; > return ERR_PTR(-ENOMEM); > + } Along the lines of your idea, perhaps we could add a parameter "bool never_fail" to vma_modify() that is passed through to vma_merge_existing_range(), and guarantee that it never fails when that parameter is set? Then we could also check that never_fail is only used in cases where no split is necessary. That somewhat avoids having this kind of check that only ever runs in error conditions...