Re: [syzbot] [mm?] INFO: rcu detected stall in validate_mm (3)

"Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> · Sun, 12 May 2024 16:41:11 -0400

* Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [240512 13:28]:
> * syzbot <syzbot+a941018a091f1a1f9546@xxxxxxxxxxxxxxxxxxxxxxxxx> [240512 05:19]:
> > Hello,
> > 
> > syzbot found the following issue on:
> 
> First, excellent timing of this report - Sunday on an -rc7 release the
> day before LSF/MM/BPF.
> 
> > 
> > HEAD commit:    dccb07f2914c Merge tag 'for-6.9-rc7-tag' of git://git.kern..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f6734c980000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=7144b4fe7fbf5900
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a941018a091f1a1f9546
> > compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10306760980000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=138c8970980000
> > 
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/e1fea5a49470/disk-dccb07f2.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/5f7d53577fef/vmlinux-dccb07f2.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/430b18473a18/bzImage-dccb07f2.xz
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+a941018a091f1a1f9546@xxxxxxxxxxxxxxxxxxxxxxxxx
> > 
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P17678/1:b..l
> > rcu: 	(detected by 1, t=10502 jiffies, g=36541, q=38 ncpus=2)
> > task:syz-executor952 state:R  running task     stack:28968 pid:17678 tgid:17678 ppid:5114   flags:0x00000002
...

> 
> I cannot say that this isn't the maple tree in an infinite loop, but I
> don't think it is given the information above.  Considering the infinite
> loop scenario would produce the same crash on reproduction but this is
> not what syzbot sees on the git bisect, I think it is not an issue in
> the tree but an issue somewhere else - and probably a corruption issue
> that wasn't detected by kasan (is this possible?).

I was able to recreate this with the provided config and reproducer (but
not my own config).  My trace has no maple tree calls at all:

[  866.380945][    C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  866.381464][    C1] rcu:     (detected by 1, t=10502 jiffies, g=161409, q=149 ncpus=2)
[  866.382152][    C1] rcu: All QSes seen, last rcu_preempt kthread activity 10500 (4295023801-4295013301), jiffies_till_next_fqs=1, root ->qsmask 0x0
[  866.383324][    C1] rcu: rcu_preempt kthread starved for 10500 jiffies! g161409 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[  866.384952][    C1] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[  866.385972][    C1] rcu: RCU grace-period kthread stack dump:
[  866.386582][    C1] task:rcu_preempt     state:R  running task     stack:27648 pid:16    tgid:16    ppid:2      flags:0x00004000
[  866.387811][    C1] Call Trace:
[  866.388164][    C1]  <TASK>
[  866.388475][    C1]  __schedule+0xf06/0x5cb0
[  866.388961][    C1]  ? __pfx___lock_acquire+0x10/0x10
[  866.389528][    C1]  ? __pfx___schedule+0x10/0x10
[  866.390065][    C1]  ? schedule+0x298/0x350
[  866.390541][    C1]  ? __pfx_lock_release+0x10/0x10
[  866.391090][    C1]  ? __pfx___mod_timer+0x10/0x10
[  866.391633][    C1]  ? lock_acquire+0x1b1/0x560
[  866.392133][    C1]  ? lockdep_init_map_type+0x16d/0x7e0
[  866.392709][    C1]  schedule+0xe7/0x350
[  866.393139][    C1]  schedule_timeout+0x136/0x2a0
[  866.393654][    C1]  ? __pfx_schedule_timeout+0x10/0x10
[  866.394142][    C1]  ? __pfx_process_timeout+0x10/0x10
[  866.394596][    C1]  ? _raw_spin_unlock_irqrestore+0x3b/0x80
[  866.395137][    C1]  ? prepare_to_swait_event+0xf0/0x470
[  866.395714][    C1]  rcu_gp_fqs_loop+0x1ab/0xbd0
[  866.396246][    C1]  ? __pfx_rcu_gp_fqs_loop+0x10/0x10
[  866.396852][    C1]  ? rcu_gp_init+0xbdb/0x1480
[  866.397393][    C1]  ? __pfx_rcu_gp_cleanup+0x10/0x10
[  866.397988][    C1]  rcu_gp_kthread+0x271/0x380
[  866.398493][    C1]  ? __pfx_rcu_gp_kthread+0x10/0x10
[  866.399063][    C1]  ? lockdep_hardirqs_on+0x7c/0x110
[  866.399570][    C1]  ? __kthread_parkme+0x143/0x220
[  866.400045][    C1]  ? __pfx_rcu_gp_kthread+0x10/0x10
[  866.400535][    C1]  kthread+0x2c1/0x3a0
[  866.400916][    C1]  ? _raw_spin_unlock_irq+0x23/0x50
[  866.401409][    C1]  ? __pfx_kthread+0x10/0x10
[  866.401854][    C1]  ret_from_fork+0x45/0x80
[  866.402284][    C1]  ? __pfx_kthread+0x10/0x10
[  866.402718][    C1]  ret_from_fork_asm+0x1a/0x30
[  866.403167][    C1]  </TASK>

I'm going to see if I can hit the corrupted stack version with kasan enabled.

Thanks,
Liam