Re: [PATCH] ALSA: seq: Fix RCU stall in snd_seq_write()

Takashi Iwai <tiwai@xxxxxxx> · Tue, 02 Nov 2021 09:33:36 +0100

On Tue, 02 Nov 2021 04:32:22 +0100,
Zqiang wrote:
> 
> If we have a lot of cell object, this cycle may take a long time, and
> trigger RCU stall. insert a conditional reschedule point to fix it.
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 	1-....: (1 GPs behind) idle=9f5/1/0x4000000000000000
> 	softirq=16474/16475 fqs=4916
> 	(t=10500 jiffies g=19249 q=192515)
> NMI backtrace for cpu 1
> ......
> asm_sysvec_apic_timer_interrupt
> RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70
> spin_unlock_irqrestore
> snd_seq_prioq_cell_out+0x1dc/0x360
> snd_seq_check_queue+0x1a6/0x3f0
> snd_seq_enqueue_event+0x1ed/0x3e0
> snd_seq_client_enqueue_event.constprop.0+0x19a/0x3c0
> snd_seq_write+0x2db/0x510
> vfs_write+0x1c4/0x900
> ksys_write+0x171/0x1d0
> do_syscall_64+0x35/0xb0
> 
> Reported-by: syzbot+bb950e68b400ab4f65f8@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Zqiang <qiang.zhang1211@xxxxxxxxx>
> ---
>  sound/core/seq/seq_queue.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/sound/core/seq/seq_queue.c b/sound/core/seq/seq_queue.c
> index d6c02dea976c..f5b1e4562a64 100644
> --- a/sound/core/seq/seq_queue.c
> +++ b/sound/core/seq/seq_queue.c
> @@ -263,6 +263,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop)
>  		if (!cell)
>  			break;
>  		snd_seq_dispatch_event(cell, atomic, hop);
> +		cond_resched();
>  	}
>  
>  	/* Process time queue... */
> @@ -272,6 +273,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop)
>  		if (!cell)
>  			break;
>  		snd_seq_dispatch_event(cell, atomic, hop);
> +		cond_resched();

It's good to have cond_resched() in those places but it must be done
more carefully, as the code path may be called from the non-atomic
context, too.  That is, it must have a check of atomic argument, and
cond_resched() is applied only when atomic==false.

But I still wonder how this gets a RCU stall out of sudden.  Looking
through https://syzkaller.appspot.com/bug?extid=bb950e68b400ab4f65f8
it's triggered by many cases since the end of September...

thanks,

Takashi