Re: [syzbot] [mm?] KCSAN: data-race in mtree_range_walk / rcu_segcblist_enqueue (2)

"Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> · Fri, 21 Jun 2024 13:31:14 -0400

* Marco Elver <elver@xxxxxxxxxx> [240621 11:29]:
> [+Cc rcu folks]
> 
> On Fri, 21 Jun 2024 at 15:29, syzbot
> <syzbot+9bb7d0f2fdb4229b9d67@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    50736169ecc8 Merge tag 'for-6.10-rc4-tag' of git://git.ker..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=164ec02a980000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=704451bc2941bcb0
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9bb7d0f2fdb4229b9d67
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/e4cbed12fec1/disk-50736169.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/d50b5dcae4cd/vmlinux-50736169.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/f2c14c5fcce2/bzImage-50736169.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+9bb7d0f2fdb4229b9d67@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ==================================================================
> > BUG: KCSAN: data-race in mtree_range_walk / rcu_segcblist_enqueue
> >
> > write to 0xffff888104077308 of 8 bytes by task 12265 on cpu 1:
> >  rcu_segcblist_enqueue+0x67/0xb0 kernel/rcu/rcu_segcblist.c:345
> >  rcutree_enqueue kernel/rcu/tree.c:2940 [inline]
> >  call_rcu_core kernel/rcu/tree.c:2957 [inline]
> >  __call_rcu_common kernel/rcu/tree.c:3093 [inline]
> >  call_rcu+0x1bd/0x430 kernel/rcu/tree.c:3176
> >  ma_free_rcu lib/maple_tree.c:197 [inline]
> >  mas_free lib/maple_tree.c:1304 [inline]
> >  mas_replace_node+0x2f8/0x440 lib/maple_tree.c:1741
> >  mas_wr_node_store lib/maple_tree.c:3956 [inline]
> >  mas_wr_modify+0x2bc3/0x3c90 lib/maple_tree.c:4189
> >  mas_wr_store_entry+0x250/0x390 lib/maple_tree.c:4229
> >  mas_store_prealloc+0x151/0x2b0 lib/maple_tree.c:5485
> >  vma_iter_store mm/internal.h:1398 [inline]
> >  vma_complete+0x3a7/0x760 mm/mmap.c:535
> >  __split_vma+0x623/0x690 mm/mmap.c:2440
> >  split_vma mm/mmap.c:2466 [inline]
> >  vma_modify+0x198/0x1f0 mm/mmap.c:2507
> >  vma_modify_flags include/linux/mm.h:3347 [inline]
> >  mprotect_fixup+0x335/0x610 mm/mprotect.c:637
> >  do_mprotect_pkey+0x673/0x9a0 mm/mprotect.c:820
> >  __do_sys_mprotect mm/mprotect.c:841 [inline]
> >  __se_sys_mprotect mm/mprotect.c:838 [inline]
> >  __x64_sys_mprotect+0x48/0x60 mm/mprotect.c:838
> >  x64_sys_call+0x26f5/0x2d70 arch/x86/include/generated/asm/syscalls_64.h:11
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > read to 0xffff888104077308 of 8 bytes by task 12266 on cpu 0:
> >  mtree_range_walk+0x140/0x460 lib/maple_tree.c:2774
> >  mas_state_walk lib/maple_tree.c:3678 [inline]
> >  mas_walk+0x16e/0x320 lib/maple_tree.c:4909
> >  lock_vma_under_rcu+0x84/0x260 mm/memory.c:5840
> >  do_user_addr_fault arch/x86/mm/fault.c:1329 [inline]
> >  handle_page_fault arch/x86/mm/fault.c:1481 [inline]
> >  exc_page_fault+0x150/0x650 arch/x86/mm/fault.c:1539
> >  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 0 PID: 12266 Comm: syz-executor.3 Not tainted 6.10.0-rc4-syzkaller-00148-g50736169ecc8 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> > ==================================================================
> 
> This is not an ordinary data race. I suspect this to be an incorrect
> use of RCU, resulting in some kind of use-after-free / type-confusion.
> 
> The access within rcu_segcblist_enqueue() is to maple_node::rcu (at
> offset 8 into maple_node). The racing access in mtree_range_walk() is
> to either maple_node::mr64::pivot[0] or maple_node::ma64::pivot[0]
> (both also offset 8 into maple_node).

Since it's not freed and the reader holds the RCU read lock, there is no
use-after-free risk here.

Both are at offset 8 of the node, but there is no type confusion.

This is a false positive, which I can explain.

The reader at mtree_range_walk() at 2774 reads piv[0] at offset 8, but
will validate the information by checking the parent pointer at offset 0
prior to using the value.  In this case the check is on line 2793: if
(unlikely(ma_dead_node(node)))...

In the case of the reader having stale data, the data is thrown away and
the walk is started again.  This node is already taken out of the tree
and will not be encountered again.

Note that all types have the same parent pointer (of undefined type
struct maple_pnode *, to catch type confusion at compile time) at offset
0.

On the writer side, the struct maple_pnode *parent is set to the address
of the node itself.  When this happens,
lib/maple_tree.c:mte_set_node_dead() is called to set the parent parent
pointer and smp_wmb();  This corresponds to ma_dead_node() or
mte_dead_node() that uses smp_rmb(); prior to reading the parent
pointer.

I ran though this all with Paul (embarrassingly, a while back), and I
believe (if my notes are correct..) the fix I need here is to use
rcu_assign_pointer() in mte_set_node_dead() to make the checks here
happy.

Thanks,
Liam