On Mon, Dec 09, 2024 at 10:33:56AM -0500, Liam R. Howlett wrote: > * Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [241209 08:58]: > > On Mon, Dec 09, 2024 at 02:52:17PM +0100, Jann Horn wrote: > > > On Mon, Dec 9, 2024 at 1:53 PM Lorenzo Stoakes > > > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > > > > > On Mon, Dec 09, 2024 at 03:20:19AM -0800, syzbot wrote: > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: feffde684ac2 Merge tag 'for-6.13-rc1-tag' of git://git.ker.. > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f85fc0580000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=50c7a61469ce77e7 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=2d788f4f7cb660dac4b7 > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > Points to this being racey. > > > > > > > > > > > > > > Downloadable assets: > > > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-feffde68.raw.xz > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6135c7297e8e/vmlinux-feffde68.xz > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/6c154fdcc9cb/bzImage-feffde68.xz > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+2d788f4f7cb660dac4b7@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > > > > > Oops: general protection fault, probably for non-canonical address 0xdffffc0000000080: 0000 [#1] PREEMPT SMP KASAN NOPTI > > > > > KASAN: null-ptr-deref in range [0x0000000000000400-0x0000000000000407] > > > > > > > > This doesn't make a huge amount of sense to me, the VMA is not 0x400 (1,024) > > > > bytes in size... and the actual faulting offset seems to be 0xdffffc0000000080 > > > > which is 0x80 off from some KASAN-specified value? > > > > > > > > This would be vma->vm_file. But that also doesn't really make any sense. > > > > > > > > But I wonder... > > > > > > > > I see in the report at [0] that there's a failure injection in vm_area_dup() on > > > > fork: > > > > > > > > [ 73.842623][ T5318] ? kmem_cache_alloc_noprof+0x48/0x380 > > > > [ 73.844725][ T5318] ? __pfx___might_resched+0x10/0x10 > > > > [ 73.846687][ T5318] should_fail_ex+0x3b0/0x4e0 > > > > [ 73.848496][ T5318] should_failslab+0xac/0x100 > > > > [ 73.850232][ T5318] ? vm_area_dup+0x27/0x290 > > > > [ 73.852017][ T5318] kmem_cache_alloc_noprof+0x70/0x380 > > > > [ 73.854011][ T5318] vm_area_dup+0x27/0x290 > > > > [ 73.855771][ T5318] copy_mm+0xc1d/0x1f90 > > > > > > > > I also see in the fork logic we have the following code on error path: > > > > > > > > mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1); > > > > mas_store(&vmi.mas, XA_ZERO_ENTRY); > > > > > > > > And XA_ZERO_ENTRY is 0x406. > > > > > > > > Now if _somehow_ the VMA was being looked up without XA_ZERO_ENTRY being > > > > properly accounted for, this might explain it, and why all the !vma logic would > > > > be bypassed. > > > > > > You fixed another issue in this area a month ago, right? > > > (https://project-zero.issues.chromium.org/373391951, > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f64e67e5d3a45a4a04286c47afade4b518acd47b, > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=985da552a98e27096444508ce5d853244019111f) > > > > That's for ksm/uffd though, neither pertinent here. > > > > > > > > And we came to the conclusion that MMs whose VMAs have not been > > > completely copied and might have XA_ZERO_ENTRY entries left should > > > never become visible to anything other than the MM teardown code? > > > > Well if we came to that conclusion, it was wrong! :) > > > > Error paths at play again. I mean I think probably the slab allocation is 'too > > small to fail' _in reality_. But somebody will point out some horrendous way > > involving a fatal signal or what-not where we could hit this. Maybe. > > > > > > > > > > RIP: 0010:reusable_anon_vma mm/vma.c:1837 [inline] > > > > > RIP: 0010:find_mergeable_anon_vma+0x1e4/0x8f0 mm/vma.c:1863 > > > > > Code: 00 00 00 00 fc ff df 41 80 3c 06 00 74 08 4c 89 ff e8 10 39 10 00 4d 8b 37 4d 89 ec 49 c1 ec 03 48 b8 00 00 00 00 00 fc ff df <41> 80 3c 04 00 74 08 4c 89 ef e8 ed 38 10 00 49 8b 5d 00 4c 89 f7 > > > > > RSP: 0018:ffffc9000d3df500 EFLAGS: 00010203 > > > > > RAX: dffffc0000000000 RBX: ffffc9000d3df540 RCX: ffff88801cf80000 > > > > > RDX: 0000000000000000 RSI: ffffffff900062a0 RDI: 0000000000000000 > > > > > RBP: ffffc9000d3df610 R08: 0000000000000005 R09: ffffffff8bc6b642 > > > > > R10: 0000000000000003 R11: ffff88801cf80000 R12: 0000000000000080 > > > > > R13: 0000000000000406 R14: 0000000021000000 R15: ffff8880120d4ca0 > > > > > FS: 00007f137f7e86c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 0000000020000140 CR3: 0000000040256000 CR4: 0000000000352ef0 > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > Call Trace: > > > > > <TASK> > > > > > __anon_vma_prepare+0xd9/0x4a0 mm/rmap.c:199 > > > > > anon_vma_prepare include/linux/rmap.h:164 [inline] > > > > > uprobe_write_opcode+0x1a95/0x2d80 kernel/events/uprobes.c:516 > > > > > > > > Here we find the VMA via: > > > > > > > > old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma); > > > > > > > > Actually one unfortunate thing here is... ugh god. > > > > > > > > I think there might be a bug in get_user_page_vma_remote()... > > > > > > > > I will check in more detail but I don't see anything that will prevent the > > > > mmap lock from being dropped before we perform the > > > > vma_lookup()... FOLL_UNLOCKABLE will be set due to the &local_lock > > > > shenanigans in get_user_pages_remote(), and if we get a page after a > > > > dropped lock and try to vma_lookup() we could be racing... :/ > > > > > > Hm, aren't we holding an mmap_write_lock() across the whole operation > > > in register_for_each_vma()? I don't think FOLL_UNLOCKABLE will be set, > > > the call from get_user_pages_remote() to is_valid_gup_args() passes > > > the caller's "locked" parameter, not &local_locked. > > > > Yeah I was just about to reply saying this, that code should be cleaned up > > a bit to make clear... But yeah it's the bool *locked of the invoker, and > > can't be &local_locked. > > > > So yes this rules out get_user_page_vma_remote() as a problem, which is > > good, because I wrote that :P > > The mm_struct isn't fully initialized at this point - and won't be once > the dup_mmap() fails. How exactly are we getting to this point in the > first place? > > I have some ideas on fixing this particular issue in the not fully > initialised mm structure, but we will still be using a > not-fully-initialised mm structure and that sounds wrong on a whole > other level. It seems like uprobe can still connect at least via bpf... it uses dup_mmap_sem to prevent races with dup_mmap(), but then in no way checks to see if the fork _succeeded_ and assumes that the uprobe is good to go. I wonder if we can tell uprobe... not to do this in that case :) Some MMF_xxx maybe could help us? I guess we're full up there... but maybe MMF_UNSTABLE somehow? > > Thanks, > Liam > >