+ Suren for VMA lock aspects. On Mon, Sep 23, 2024 at 10:44:34AM GMT, Lorenzo Stoakes wrote: > On Mon, Sep 23, 2024 at 02:04:23AM GMT, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 88264981f208 Merge tag 'sched_ext-for-6.12' of git://git.k.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1237ec27980000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=e6702f5f2b8ed242 > > dashboard link: https://syzkaller.appspot.com/bug?extid=e01fa33e67abb0b3b3bb > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > Thanks for the report, investigating. TL;DR: I think there's a race, but it's in code that detects and handles races, we probably need to add a data_race() or similar annotation. I'll leave it to Liam (who is currently on leave) to address annotating this if appropriate on his return, unless this becomes more urgent in the meantime. > > > Unfortunately, I don't have any reproducer for this issue yet. > > I suspect given this is so timing-specific, a reproducer might be difficult. > > > > > Downloadable assets: > > disk image: https://storage.googleapis.com/syzbot-assets/95bba355b2ed/disk-88264981.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/75966f4e5286/vmlinux-88264981.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/7f1578876250/bzImage-88264981.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+e01fa33e67abb0b3b3bb@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > ================================================================== > > BUG: KCSAN: data-race in mas_wr_store_entry / mtree_range_walk > > > > write to 0xffff888114555710 of 8 bytes by task 9573 on cpu 1: This is: wr_mas->pivots[offset] = mas->index - 1; > > mas_wr_slot_store lib/maple_tree.c:3889 [inline] > > mas_wr_store_entry+0x146b/0x2d00 lib/maple_tree.c:4075 > > mas_store_prealloc+0x6bf/0x960 lib/maple_tree.c:5520 > > vma_iter_store mm/vma.h:470 [inline] > > commit_merge+0x441/0x740 mm/vma.c:609 > > vma_expand+0x211/0x360 mm/vma.c:1024 > > vma_merge_new_range+0x2cf/0x3e0 mm/vma.c:963 > > mmap_region+0x887/0x16e0 mm/mmap.c:1416 > > do_mmap+0x718/0xb60 mm/mmap.c:496 > > vm_mmap_pgoff+0x133/0x290 mm/util.c:588 > > ksys_mmap_pgoff+0xd0/0x330 mm/mmap.c:542 > > x64_sys_call+0x1884/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:10 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > read to 0xffff888114555710 of 8 bytes by task 9574 on cpu 0: > > mtree_range_walk+0x1b4/0x460 lib/maple_tree.c:2779 This is: if (pivots[offset] >= mas->index) { So we appear to be racing on this. But... I don't think there's an actual race issue here, just maybe a need for a data_race() wrapper or similar. The lock_vma_under_rcu() call is an optimistic attempt to obtain a VMA, and will try again if it is unable to get a stable reference to a VMA. On vma_start_write() which the above stack writing to the maple tree invokes prior to writing the maple tree, we lock the VMA for writing, and set the vma->vm_lock_seq value equal to vma->vm_mm->mm_lock_seq, before unlocking. So we know that vma->vm_lock_seq == vma->vm_mm->mm_lock_seq right up until vma_end_write_all() is called after the write is done (via mmap_write_unlock()). Then, in lock_vma_under_rcu(), vma_start_read() is called which bails out if vma->vm_lock_seq == vma->vm_mm->mm_lock_seq, which it likely would be. If you were _very_ unlucky with timing you might still get the lock on the modified VMA, however if it was detached by then, it would have been marked detached and lock_vma_under_rcu() will bail out, finally if we got the VMA and it changed (e.g. expand above), we explicitly check that the address is still in range - if it is, we're perfectly good to use it. If lock_vma_under_rcu() bails it resorts to the slow path, no harm no foul. I don't know why this was triggered with recent changes, maybe a timing thing, but I don't _think_ we have an issue here. No part of this code path really should be different here. If we have further reports in this area or any indication of a deeper issue I'll dive back in. > > mas_state_walk lib/maple_tree.c:3601 [inline] > > mas_walk+0x16e/0x320 lib/maple_tree.c:4948 > > lock_vma_under_rcu+0x95/0x260 mm/memory.c:6224 > > do_user_addr_fault arch/x86/mm/fault.c:1329 [inline] > > handle_page_fault arch/x86/mm/fault.c:1481 [inline] > > exc_page_fault+0x150/0x650 arch/x86/mm/fault.c:1539 > > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 > > > > value changed: 0x00007f8311576fff -> 0xffffffff8529a680 > > This suggests we are failing to acquire an RCU lock on mmap() (though we > have the write mmap lock). > > Maybe we missed an RCU lock at some point, but I'm a little baffled as to > what could have changed in recent series to adjust this. > > I will dig into this and see what's going on. > > > > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 0 UID: 0 PID: 9574 Comm: syz.0.2084 Tainted: G W 6.11.0-syzkaller-08481-g88264981f208 #0 > > Tainted: [W]=WARN > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 > > ================================================================== > > > > > > --- > > This report is generated by a bot. It may contain errors. > > See https://goo.gl/tpsmEJ for more information about syzbot. > > syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx. > > > > syzbot will keep track of this issue. See: > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > If the report is already addressed, let syzbot know by replying with: > > #syz fix: exact-commit-title > > > > If you want to overwrite report's subsystems, reply with: > > #syz set subsystems: new-subsystem > > (See the list of subsystem names on the web dashboard) > > > > If the report is a duplicate of another one, reply with: > > #syz dup: exact-subject-of-another-report > > > > If you want to undo deduplication, reply with: > > #syz undup