On Tue, 02 Nov 2021 04:32:22 +0100, Zqiang wrote: > > If we have a lot of cell object, this cycle may take a long time, and > trigger RCU stall. insert a conditional reschedule point to fix it. > > rcu: INFO: rcu_preempt self-detected stall on CPU > rcu: 1-....: (1 GPs behind) idle=9f5/1/0x4000000000000000 > softirq=16474/16475 fqs=4916 > (t=10500 jiffies g=19249 q=192515) > NMI backtrace for cpu 1 > ...... > asm_sysvec_apic_timer_interrupt > RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 > spin_unlock_irqrestore > snd_seq_prioq_cell_out+0x1dc/0x360 > snd_seq_check_queue+0x1a6/0x3f0 > snd_seq_enqueue_event+0x1ed/0x3e0 > snd_seq_client_enqueue_event.constprop.0+0x19a/0x3c0 > snd_seq_write+0x2db/0x510 > vfs_write+0x1c4/0x900 > ksys_write+0x171/0x1d0 > do_syscall_64+0x35/0xb0 > > Reported-by: syzbot+bb950e68b400ab4f65f8@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Zqiang <qiang.zhang1211@xxxxxxxxx> > --- > sound/core/seq/seq_queue.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/sound/core/seq/seq_queue.c b/sound/core/seq/seq_queue.c > index d6c02dea976c..f5b1e4562a64 100644 > --- a/sound/core/seq/seq_queue.c > +++ b/sound/core/seq/seq_queue.c > @@ -263,6 +263,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) > if (!cell) > break; > snd_seq_dispatch_event(cell, atomic, hop); > + cond_resched(); > } > > /* Process time queue... */ > @@ -272,6 +273,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) > if (!cell) > break; > snd_seq_dispatch_event(cell, atomic, hop); > + cond_resched(); It's good to have cond_resched() in those places but it must be done more carefully, as the code path may be called from the non-atomic context, too. That is, it must have a check of atomic argument, and cond_resched() is applied only when atomic==false. But I still wonder how this gets a RCU stall out of sudden. Looking through https://syzkaller.appspot.com/bug?extid=bb950e68b400ab4f65f8 it's triggered by many cases since the end of September... thanks, Takashi