* syzbot <syzbot+a941018a091f1a1f9546@xxxxxxxxxxxxxxxxxxxxxxxxx> [240512 05:19]: > Hello, > > syzbot found the following issue on: First, excellent timing of this report - Sunday on an -rc7 release the day before LSF/MM/BPF. > > HEAD commit: dccb07f2914c Merge tag 'for-6.9-rc7-tag' of git://git.kern.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=13f6734c980000 > kernel config: https://syzkaller.appspot.com/x/.config?x=7144b4fe7fbf5900 > dashboard link: https://syzkaller.appspot.com/bug?extid=a941018a091f1a1f9546 > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10306760980000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138c8970980000 > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/e1fea5a49470/disk-dccb07f2.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/5f7d53577fef/vmlinux-dccb07f2.xz > kernel image: https://storage.googleapis.com/syzbot-assets/430b18473a18/bzImage-dccb07f2.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+a941018a091f1a1f9546@xxxxxxxxxxxxxxxxxxxxxxxxx > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P17678/1:b..l > rcu: (detected by 1, t=10502 jiffies, g=36541, q=38 ncpus=2) > task:syz-executor952 state:R running task stack:28968 pid:17678 tgid:17678 ppid:5114 flags:0x00000002 > Call Trace: > <TASK> > context_switch kernel/sched/core.c:5409 [inline] > __schedule+0xf15/0x5d00 kernel/sched/core.c:6746 > preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7068 > irqentry_exit+0x36/0x90 kernel/entry/common.c:354 > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:88 [inline] > RIP: 0010:memory_is_nonzero mm/kasan/generic.c:122 [inline] > RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline] > RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline] > RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline] > RIP: 0010:kasan_check_range+0xc7/0x1a0 mm/kasan/generic.c:189 > Code: 83 c0 08 48 39 d0 0f 84 be 00 00 00 48 83 38 00 74 ed 48 8d 50 08 eb 0d 48 83 c0 01 48 39 c2 0f 84 8d 00 00 00 80 38 00 74 ee <48> 89 c2 b8 01 00 00 00 48 85 d2 74 1e 41 83 e2 07 49 39 d1 75 0a > RSP: 0018:ffffc900031ef850 EFLAGS: 00000202 > RAX: fffffbfff2949b78 RBX: fffffbfff2949b79 RCX: ffffffff8ac92249 > RDX: fffffbfff2949b79 RSI: 0000000000000004 RDI: ffffffff94a4dbc0 > RBP: fffffbfff2949b78 R08: 0000000000000001 R09: fffffbfff2949b78 > R10: ffffffff94a4dbc3 R11: 0000000000000001 R12: 0000000000000000 > R13: 0000000000000001 R14: 0000000000000300 R15: 0000000000000000 > instrument_atomic_read_write include/linux/instrumented.h:96 [inline] > atomic_inc include/linux/atomic/atomic-instrumented.h:435 [inline] > mt_validate_nulls+0x5e9/0x9e0 lib/maple_tree.c:7550 > mt_validate+0x3148/0x4390 lib/maple_tree.c:7599 > validate_mm+0x9c/0x4b0 mm/mmap.c:288 > mmap_region+0x1478/0x2760 mm/mmap.c:2934 > do_mmap+0x8ae/0xf10 mm/mmap.c:1385 > vm_mmap_pgoff+0x1ab/0x3c0 mm/util.c:573 > ksys_mmap_pgoff+0x7d/0x5b0 mm/mmap.c:1431 ... I was concerned that we had somehow constructed a broken tree, but I believe the information below rules that situation out. It appears that the verification of a tasks maple tree has exceeded the timeout allotted to do so. This call stack indicates it is all happening while holding the mmap lock, so no locking or RCU issue there. This trace seems to think we are stuck in the checking the tree for sequential NULLs, but not in the tree operation itself. This would indicate the issue isn't here at all - or we have a broken tree which causes the iteration to never advance. The adjustments of the timeouts do seem to be sufficient and I am not getting hung on my vm running the c reproducer, yet. I am not using the bots config, yet. I also noticed that the git bisect is very odd and inconsistent, often ending in "crashed: INFO: rcu detected stall in corrupted". I also noticed that KASAN is disabled in this report? "disabling configs for [UBSAN BUG KASAN LOCKDEP ATOMIC_SLEEP LEAK], they are not needed" This seems like it would be wise to enable as it seems there is corrupted stack traces, at least? I noticed that the .config DOES have kasan enabled, so I guess it was dropped because it didn't pick up an issue on the initial run? There is only one report (the initial report) that detects the hung state in the validate_mm() test function. This is actually the less concerning of all of the other places - because this validate function is generally disabled on production systems. The last change to lib/maple_tree.c went in through in mm-stable-2024-03-13-20-04. I cannot say that this isn't the maple tree in an infinite loop, but I don't think it is given the information above. Considering the infinite loop scenario would produce the same crash on reproduction but this is not what syzbot sees on the git bisect, I think it is not an issue in the tree but an issue somewhere else - and probably a corruption issue that wasn't detected by kasan (is this possible?). Thanks, Liam