On Mon, Dec 09, 2024 at 03:20:19AM -0800, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: feffde684ac2 Merge tag 'for-6.13-rc1-tag' of git://git.ker.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=17f85fc0580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=50c7a61469ce77e7 > dashboard link: https://syzkaller.appspot.com/bug?extid=2d788f4f7cb660dac4b7 > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > Unfortunately, I don't have any reproducer for this issue yet. Points to this being racey. > > Downloadable assets: > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-feffde68.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/6135c7297e8e/vmlinux-feffde68.xz > kernel image: https://storage.googleapis.com/syzbot-assets/6c154fdcc9cb/bzImage-feffde68.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+2d788f4f7cb660dac4b7@xxxxxxxxxxxxxxxxxxxxxxxxx > > Oops: general protection fault, probably for non-canonical address 0xdffffc0000000080: 0000 [#1] PREEMPT SMP KASAN NOPTI > KASAN: null-ptr-deref in range [0x0000000000000400-0x0000000000000407] This doesn't make a huge amount of sense to me, the VMA is not 0x400 (1,024) bytes in size... and the actual faulting offset seems to be 0xdffffc0000000080 which is 0x80 off from some KASAN-specified value? This would be vma->vm_file. But that also doesn't really make any sense. But I wonder... I see in the report at [0] that there's a failure injection in vm_area_dup() on fork: [ 73.842623][ T5318] ? kmem_cache_alloc_noprof+0x48/0x380 [ 73.844725][ T5318] ? __pfx___might_resched+0x10/0x10 [ 73.846687][ T5318] should_fail_ex+0x3b0/0x4e0 [ 73.848496][ T5318] should_failslab+0xac/0x100 [ 73.850232][ T5318] ? vm_area_dup+0x27/0x290 [ 73.852017][ T5318] kmem_cache_alloc_noprof+0x70/0x380 [ 73.854011][ T5318] vm_area_dup+0x27/0x290 [ 73.855771][ T5318] copy_mm+0xc1d/0x1f90 I also see in the fork logic we have the following code on error path: mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1); mas_store(&vmi.mas, XA_ZERO_ENTRY); And XA_ZERO_ENTRY is 0x406. Now if _somehow_ the VMA was being looked up without XA_ZERO_ENTRY being properly accounted for, this might explain it, and why all the !vma logic would be bypassed. [0]:https://syzkaller.appspot.com/x/log.txt?x=17f85fc0580000 I mean the weird thing for me here is that mtree_load() has: if (xa_is_zero(entry)) return NULL; So you'd think it'd pick this up, but maybe if we're not actually holding the right lock we get a partial write/race of some kind and... yeah. Anything's possible then I suppose... > CPU: 0 UID: 0 PID: 5319 Comm: syz.0.0 Not tainted 6.13.0-rc1-syzkaller-00025-gfeffde684ac2 #0 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > RIP: 0010:anon_vma_compatible mm/vma.c:1804 [inline] This is in: static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *b) { return a->vm_end == b->vm_start && <-- this line This suggests that either a->vm_end (offset 0x8 into the VMA) or b->vm_start (offset 0 into the VMA) are being null pointer deref'd assuming the compiler is specifically referring to this _typographical_ line rather than the expression as a whole. > RIP: 0010:reusable_anon_vma mm/vma.c:1837 [inline] > RIP: 0010:find_mergeable_anon_vma+0x1e4/0x8f0 mm/vma.c:1863 > Code: 00 00 00 00 fc ff df 41 80 3c 06 00 74 08 4c 89 ff e8 10 39 10 00 4d 8b 37 4d 89 ec 49 c1 ec 03 48 b8 00 00 00 00 00 fc ff df <41> 80 3c 04 00 74 08 4c 89 ef e8 ed 38 10 00 49 8b 5d 00 4c 89 f7 > RSP: 0018:ffffc9000d3df500 EFLAGS: 00010203 > RAX: dffffc0000000000 RBX: ffffc9000d3df540 RCX: ffff88801cf80000 > RDX: 0000000000000000 RSI: ffffffff900062a0 RDI: 0000000000000000 > RBP: ffffc9000d3df610 R08: 0000000000000005 R09: ffffffff8bc6b642 > R10: 0000000000000003 R11: ffff88801cf80000 R12: 0000000000000080 > R13: 0000000000000406 R14: 0000000021000000 R15: ffff8880120d4ca0 > FS: 00007f137f7e86c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000020000140 CR3: 0000000040256000 CR4: 0000000000352ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > __anon_vma_prepare+0xd9/0x4a0 mm/rmap.c:199 > anon_vma_prepare include/linux/rmap.h:164 [inline] > uprobe_write_opcode+0x1a95/0x2d80 kernel/events/uprobes.c:516 Here we find the VMA via: old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma); Actually one unfortunate thing here is... ugh god. I think there might be a bug in get_user_page_vma_remote()... I will check in more detail but I don't see anything that will prevent the mmap lock from being dropped before we perform the vma_lookup()... FOLL_UNLOCKABLE will be set due to the &local_lock shenanigans in get_user_pages_remote(), and if we get a page after a dropped lock and try to vma_lookup() we could be racing... :/ Let me look into that more... > install_breakpoint+0x4fc/0x660 kernel/events/uprobes.c:1135 > register_for_each_vma+0xa08/0xc50 kernel/events/uprobes.c:1275 > uprobe_register+0x811/0x970 kernel/events/uprobes.c:1384 > bpf_uprobe_multi_link_attach+0xaca/0xdd0 kernel/trace/bpf_trace.c:3442 > link_create+0x6d7/0x870 kernel/bpf/syscall.c:5399 > __sys_bpf+0x4bc/0x810 kernel/bpf/syscall.c:5860 > __do_sys_bpf kernel/bpf/syscall.c:5897 [inline] > __se_sys_bpf kernel/bpf/syscall.c:5895 [inline] > __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5895 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x7f137e97ff19 > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007f137f7e8058 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 > RAX: ffffffffffffffda RBX: 00007f137eb46080 RCX: 00007f137e97ff19 > RDX: 000000000000003c RSI: 00000000200012c0 RDI: 000000000000001c > RBP: 00007f137e9f3986 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 0000000000000000 R14: 00007f137eb46080 R15: 00007fff36be56b8 > </TASK> > Modules linked in: > ---[ end trace 0000000000000000 ]--- > RIP: 0010:anon_vma_compatible mm/vma.c:1804 [inline] > RIP: 0010:reusable_anon_vma mm/vma.c:1837 [inline] > RIP: 0010:find_mergeable_anon_vma+0x1e4/0x8f0 mm/vma.c:1863 > Code: 00 00 00 00 fc ff df 41 80 3c 06 00 74 08 4c 89 ff e8 10 39 10 00 4d 8b 37 4d 89 ec 49 c1 ec 03 48 b8 00 00 00 00 00 fc ff df <41> 80 3c 04 00 74 08 4c 89 ef e8 ed 38 10 00 49 8b 5d 00 4c 89 f7 > RSP: 0018:ffffc9000d3df500 EFLAGS: 00010203 > RAX: dffffc0000000000 RBX: ffffc9000d3df540 RCX: ffff88801cf80000 > RDX: 0000000000000000 RSI: ffffffff900062a0 RDI: 0000000000000000 > RBP: ffffc9000d3df610 R08: 0000000000000005 R09: ffffffff8bc6b642 > R10: 0000000000000003 R11: ffff88801cf80000 R12: 0000000000000080 > R13: 0000000000000406 R14: 0000000021000000 R15: ffff8880120d4ca0 > FS: 00007f137f7e86c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000020002240 CR3: 0000000040256000 CR4: 0000000000352ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > ---------------- > Code disassembly (best guess), 6 bytes skipped: > 0: df 41 80 filds -0x80(%rcx) > 3: 3c 06 cmp $0x6,%al > 5: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) > 9: 89 ff mov %edi,%edi > b: e8 10 39 10 00 call 0x103920 > 10: 4d 8b 37 mov (%r15),%r14 > 13: 4d 89 ec mov %r13,%r12 > 16: 49 c1 ec 03 shr $0x3,%r12 > 1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > 21: fc ff df > * 24: 41 80 3c 04 00 cmpb $0x0,(%r12,%rax,1) <-- trapping instruction > 29: 74 08 je 0x33 > 2b: 4c 89 ef mov %r13,%rdi > 2e: e8 ed 38 10 00 call 0x103920 > 33: 49 8b 5d 00 mov 0x0(%r13),%rbx > 37: 4c 89 f7 mov %r14,%rdi > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > If the report is already addressed, let syzbot know by replying with: > #syz fix: exact-commit-title > > If you want to overwrite report's subsystems, reply with: > #syz set subsystems: new-subsystem > (See the list of subsystem names on the web dashboard) > > If the report is a duplicate of another one, reply with: > #syz dup: exact-subject-of-another-report > > If you want to undo deduplication, reply with: > #syz undup