Re: [syzbot] [mm?] general protection fault in find_mergeable_anon_vma

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Mon, 9 Dec 2024 13:41:15 +0000

OK I think I know what's going on.

As mentioned below fault injection results in:

 	mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
 	mas_store(&vmi.mas, XA_ZERO_ENTRY);

Being set in the mm.

Then, in find_mergeable_anon_vma():

struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
{
	...

	/* Try next first. */
	next = vma_iter_load(&vmi);
	if (next) {
		anon_vma = reusable_anon_vma(next, vma, next);
		...
	}

	...
}

So we use vma_iter_load() -> mas_walk() -> can return an XA_ZERO_ENTRY.

So here next might be equal to XA_ZERO_ENTRY.

Then in reusable_anon_vma(), where b == next == XA_ZERO_ENTRY:

static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old,
					  struct vm_area_struct *a,
					  struct vm_area_struct *b)
{
	if (anon_vma_compatible(a, b)) {
		...
	}
	...
}

And in anon_vma_compatible():

static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *b)
{
	return a->vm_end == b->vm_start && ...
}

So b->vm_start which is offset 0 into the vm_area_struct attempts access at
0x406 and *boom*.

So we need to be a lot more careful about our use of XA_ZERO_ENTRY.

As per Jann's reply to thread, R13 is set by KASAN to the value, which is
0x406 or XA_ZERO_ENTRY so I think this explanation is pretty much confirmed.

Liam - thoughts?

On Mon, Dec 09, 2024 at 12:53:45PM +0000, Lorenzo Stoakes wrote:
> On Mon, Dec 09, 2024 at 03:20:19AM -0800, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    feffde684ac2 Merge tag 'for-6.13-rc1-tag' of git://git.ker..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17f85fc0580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=50c7a61469ce77e7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=2d788f4f7cb660dac4b7
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
>
> Points to this being racey.
>
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-feffde68.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/6135c7297e8e/vmlinux-feffde68.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/6c154fdcc9cb/bzImage-feffde68.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+2d788f4f7cb660dac4b7@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc0000000080: 0000 [#1] PREEMPT SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x0000000000000400-0x0000000000000407]
>
> This doesn't make a huge amount of sense to me, the VMA is not 0x400 (1,024)
> bytes in size... and the actual faulting offset seems to be 0xdffffc0000000080
> which is 0x80 off from some KASAN-specified value?
>
> This would be vma->vm_file. But that also doesn't really make any sense.
>
> But I wonder...
>
> I see in the report at [0] that there's a failure injection in vm_area_dup() on
> fork:
>
> [   73.842623][ T5318]  ? kmem_cache_alloc_noprof+0x48/0x380
> [   73.844725][ T5318]  ? __pfx___might_resched+0x10/0x10
> [   73.846687][ T5318]  should_fail_ex+0x3b0/0x4e0
> [   73.848496][ T5318]  should_failslab+0xac/0x100
> [   73.850232][ T5318]  ? vm_area_dup+0x27/0x290
> [   73.852017][ T5318]  kmem_cache_alloc_noprof+0x70/0x380
> [   73.854011][ T5318]  vm_area_dup+0x27/0x290
> [   73.855771][ T5318]  copy_mm+0xc1d/0x1f90
>
> I also see in the fork logic we have the following code on error path:
>
> 	mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> 	mas_store(&vmi.mas, XA_ZERO_ENTRY);
>
> And XA_ZERO_ENTRY is 0x406.
>
> Now if _somehow_ the VMA was being looked up without XA_ZERO_ENTRY being
> properly accounted for, this might explain it, and why all the !vma logic would
> be bypassed.
>
> [0]:https://syzkaller.appspot.com/x/log.txt?x=17f85fc0580000
>
> I mean the weird thing for me here is that mtree_load() has:
>
> 	if (xa_is_zero(entry))
> 		return NULL;
>
> So you'd think it'd pick this up, but maybe if we're not actually holding
> the right lock we get a partial write/race of some kind
> and... yeah. Anything's possible then I suppose...
>
> > CPU: 0 UID: 0 PID: 5319 Comm: syz.0.0 Not tainted 6.13.0-rc1-syzkaller-00025-gfeffde684ac2 #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > RIP: 0010:anon_vma_compatible mm/vma.c:1804 [inline]
>
> This is in:
>
> static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *b)
> {
> 	return a->vm_end == b->vm_start && <-- this line
>
> This suggests that either a->vm_end (offset 0x8 into the VMA) or b->vm_start
> (offset 0 into the VMA) are being null pointer deref'd assuming the compiler is
> specifically referring to this _typographical_ line rather than the expression
> as a whole.
>
> > RIP: 0010:reusable_anon_vma mm/vma.c:1837 [inline]
> > RIP: 0010:find_mergeable_anon_vma+0x1e4/0x8f0 mm/vma.c:1863
> > Code: 00 00 00 00 fc ff df 41 80 3c 06 00 74 08 4c 89 ff e8 10 39 10 00 4d 8b 37 4d 89 ec 49 c1 ec 03 48 b8 00 00 00 00 00 fc ff df <41> 80 3c 04 00 74 08 4c 89 ef e8 ed 38 10 00 49 8b 5d 00 4c 89 f7
> > RSP: 0018:ffffc9000d3df500 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: ffffc9000d3df540 RCX: ffff88801cf80000
> > RDX: 0000000000000000 RSI: ffffffff900062a0 RDI: 0000000000000000
> > RBP: ffffc9000d3df610 R08: 0000000000000005 R09: ffffffff8bc6b642
> > R10: 0000000000000003 R11: ffff88801cf80000 R12: 0000000000000080
> > R13: 0000000000000406 R14: 0000000021000000 R15: ffff8880120d4ca0
> > FS:  00007f137f7e86c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000140 CR3: 0000000040256000 CR4: 0000000000352ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  __anon_vma_prepare+0xd9/0x4a0 mm/rmap.c:199
> >  anon_vma_prepare include/linux/rmap.h:164 [inline]
> >  uprobe_write_opcode+0x1a95/0x2d80 kernel/events/uprobes.c:516
>
> Here we find the VMA via:
>
> 	old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma);
>
> Actually one unfortunate thing here is... ugh god.
>
> I think there might be a bug in get_user_page_vma_remote()...
>
> I will check in more detail but I don't see anything that will prevent the
> mmap lock from being dropped before we perform the
> vma_lookup()... FOLL_UNLOCKABLE will be set due to the &local_lock
> shenanigans in get_user_pages_remote(), and if we get a page after a
> dropped lock and try to vma_lookup() we could be racing... :/
>
> Let me look into that more...
>
> >  install_breakpoint+0x4fc/0x660 kernel/events/uprobes.c:1135
> >  register_for_each_vma+0xa08/0xc50 kernel/events/uprobes.c:1275
> >  uprobe_register+0x811/0x970 kernel/events/uprobes.c:1384
> >  bpf_uprobe_multi_link_attach+0xaca/0xdd0 kernel/trace/bpf_trace.c:3442
> >  link_create+0x6d7/0x870 kernel/bpf/syscall.c:5399
> >  __sys_bpf+0x4bc/0x810 kernel/bpf/syscall.c:5860
> >  __do_sys_bpf kernel/bpf/syscall.c:5897 [inline]
> >  __se_sys_bpf kernel/bpf/syscall.c:5895 [inline]
> >  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5895
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f137e97ff19
> > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007f137f7e8058 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
> > RAX: ffffffffffffffda RBX: 00007f137eb46080 RCX: 00007f137e97ff19
> > RDX: 000000000000003c RSI: 00000000200012c0 RDI: 000000000000001c
> > RBP: 00007f137e9f3986 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000000 R14: 00007f137eb46080 R15: 00007fff36be56b8
> >  </TASK>
> > Modules linked in:
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:anon_vma_compatible mm/vma.c:1804 [inline]
> > RIP: 0010:reusable_anon_vma mm/vma.c:1837 [inline]
> > RIP: 0010:find_mergeable_anon_vma+0x1e4/0x8f0 mm/vma.c:1863
> > Code: 00 00 00 00 fc ff df 41 80 3c 06 00 74 08 4c 89 ff e8 10 39 10 00 4d 8b 37 4d 89 ec 49 c1 ec 03 48 b8 00 00 00 00 00 fc ff df <41> 80 3c 04 00 74 08 4c 89 ef e8 ed 38 10 00 49 8b 5d 00 4c 89 f7
> > RSP: 0018:ffffc9000d3df500 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: ffffc9000d3df540 RCX: ffff88801cf80000
> > RDX: 0000000000000000 RSI: ffffffff900062a0 RDI: 0000000000000000
> > RBP: ffffc9000d3df610 R08: 0000000000000005 R09: ffffffff8bc6b642
> > R10: 0000000000000003 R11: ffff88801cf80000 R12: 0000000000000080
> > R13: 0000000000000406 R14: 0000000021000000 R15: ffff8880120d4ca0
> > FS:  00007f137f7e86c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020002240 CR3: 0000000040256000 CR4: 0000000000352ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > ----------------
> > Code disassembly (best guess), 6 bytes skipped:
> >    0:	df 41 80             	filds  -0x80(%rcx)
> >    3:	3c 06                	cmp    $0x6,%al
> >    5:	00 74 08 4c          	add    %dh,0x4c(%rax,%rcx,1)
> >    9:	89 ff                	mov    %edi,%edi
> >    b:	e8 10 39 10 00       	call   0x103920
> >   10:	4d 8b 37             	mov    (%r15),%r14
> >   13:	4d 89 ec             	mov    %r13,%r12
> >   16:	49 c1 ec 03          	shr    $0x3,%r12
> >   1a:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
> >   21:	fc ff df
> > * 24:	41 80 3c 04 00       	cmpb   $0x0,(%r12,%rax,1) <-- trapping instruction
> >   29:	74 08                	je     0x33
> >   2b:	4c 89 ef             	mov    %r13,%rdi
> >   2e:	e8 ed 38 10 00       	call   0x103920
> >   33:	49 8b 5d 00          	mov    0x0(%r13),%rbx
> >   37:	4c 89 f7             	mov    %r14,%rdi
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup