On Mon, 08 Jan 2018 17:00:08 +0100, Lars-Peter Clausen wrote: > > On 01/08/2018 04:40 PM, Takashi Iwai wrote: > > On Mon, 08 Jan 2018 16:37:32 +0100, > > Lars-Peter Clausen wrote: > >> > >> On 01/08/2018 04:05 PM, Takashi Iwai wrote: > >>> On Mon, 08 Jan 2018 15:58:47 +0100, > >>> Lars-Peter Clausen wrote: > >>>> > >>>> On 01/08/2018 03:25 PM, Takashi Iwai wrote: > >>>>> Hi, > >>>>> > >>>>> this contains two fixes for PCM OSS bugs that have been triggered by > >>>>> syzbot. These are simple and straightforward, and I'm going to queue > >>>>> for 4.15-rc8. > >>>> > >>>> I tried to reproduce the issue as well with the reproducer generated by > >>>> syzbot. But what syzbot reported was that the CPU didn't sleep for 140 > >>>> seconds. But with a long write() we'll go to sleep in between frames. So > >>>> when I tried the lockup detector never triggered. > >>>> > >>>> Were you able to reproduce the same CPU lockup as syzbot? > >>> > >>> Yes, I could reproduce RCU stalls with two reproducers > >>> (001a113ece5c2b600d05620b097a@xxxxxxxxxx and > >>> 001a1147e1988b160a05620552eb@xxxxxxxxxx) > >>> > >>> The patches seem curing them, so far, in combination with other > >>> patches in for-linus branch of sound.git tree. > >>> > >> > >> Hm, interesting. Are you using aloop for the backend or real hardware? > > > > The RCU stall happened also with dummy driver. Just feeding a GB of > > chunk to /dev/audio and aborting the stream concurrently, then it > > stalls at that point. > > Hm, does that mean that snd_pcm_playback_avail() is never 0 for the dummy > driver? Usually we should catch the signal_pending() in wait_for_avail(). > > This might not be OSS specific and maybe it is possible to trigger it from > ALSA as well. This seems specific to OSS. The problem is that OSS emulation has two nested loops in a mutex protected context. The outer loop is like: while (bytes > 0) { __snd_pcm_lib_xfer(); if (error) break; } return xfer > 0 ? xfer : error; and the inner loop (__snd_pcm_lib_xfer()) is like: while (bytes > 0) { if (avail() > 0) write(); else if (wait_for_avail() < 0) break; } return xfer > 0 ? xfer : error; Actually, wait_for_avail() does check signal_pending() and returns -ERESTARTSYS properly. This makes the inner loop aborted. However, the problem is that OSS emulation sets avail_min=1: implying that, when there is any samples empty in the ring buffer, it can continue. In the case above, when some samples have been processed before wait_for_avail(), it keeps the outer loop spinning again. Since each iteration takes ca 2ms on my VM, 8000Hz rate gives 1 or 2 samples free at each iteration. Thus at each time snd_pcm_lib_xfer() is invoked, it processes 1 or 2 samples, then returns due to pending signals, and the outer loop continues until the whole given samples are consumed. If the machine is faster or the sample rate is lower, the outer loop may exit, OTOH. In ALSA-native API, there is no outer loop in the kernel context, and each read/write merely calling a single shot snd_pcm_lib_xfer(). So, when the signal is pending, it aborts properly. Takashi _______________________________________________ Alsa-devel mailing list Alsa-devel@xxxxxxxxxxxxxxxx http://mailman.alsa-project.org/mailman/listinfo/alsa-devel