On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@xxxxxxx> wrote: > > On Wed, 13 Nov 2019 10:47:51 +0100, > Takashi Iwai wrote: > > > > On Wed, 13 Nov 2019 08:24:41 +0100, > > Chih-Yang Hsia wrote: > > > > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@xxxxxxx> wrote: > > > > > > > > On Tue, 12 Nov 2019 18:17:13 +0100, > > > > paulhsia wrote: > > > > > > > > > > Since > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of > > > > > stream lock. > > > > > > > > > > This will trigger null memory access in snd_pcm_running() call in > > > > > snd_pcm_period_elapsed. > > > > > > > > Well, if a stream is detached, it means that the stream must have been > > > > already closed; i.e. it's already a clear bug in the driver that > > > > snd_pcm_period_elapsed() is called against such a stream. > > > > > > > > Or am I missing other possible case? > > > > > > > > > > > > thanks, > > > > > > > > Takashi > > > > > > > > > > In multithreaded environment, it is possible to have to access both > > > `interrupt_handler` (from irq) and `substream close` (from > > > snd_pcm_release) at the same time. > > > Therefore, in driver implementation, if "substream close function" and > > > the "code section where snd_pcm_period_elapsed() in" do not hold the > > > same lock, then the following things can happen: > > > > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid > > > sustream pointer > > > 2. snd_pcm_release_substream: call close without blocking > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set > > > substream->runtime to NULL > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while > > > accessing fields in `substream->runtime` > > > > > > e.g. In intel8x0.c driver for ac97 device, > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after > > > checking `ichdev->substream` in `snd_intel8x0_update`. > > > And if a `snd_pcm_release` call from alsa-lib and pass through close() > > > and run to snd_pcm_detach_substream() in another thread, it's possible > > > to trigger a crash. > > > I can reproduce the issue within a multithread VM easily. > > > > > > My patches are trying to provide a basic protection for this situation > > > (and internal pcm lock between detach and elapsed), since > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about > > > the possible race if the driver does not force the order for `calling > > > snd_pcm_period_elapsed` and `close` by lock and > > > - lots of drivers already have this hidden issue and I can't fix them > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the > > > "close implementation" within all the drivers). The most common > > > mistake is that > > > - Checking if the substream is null and call into snd_pcm_period_elapsed > > > - But `close` can happen anytime, pass without block and > > > snd_pcm_detach_substream will be trigger right after it > > > > Thanks, point taken. While this argument is valid and it's good to > > harden the PCM core side, the concurrent calls are basically a bug, > > and we'd need another fix in anyway. Also, the patch 2 makes little > > sense; there can't be multiple close calls racing with each other. So > > I'll go for taking your fix but only the first patch. > > > > Back to this race: the surfaced issue is, as you pointed out, the race > > between snd_pcm_period_elapsed() vs close call. However, the > > fundamental problem is the pending action after the PCM trigger-stop > > call. Since the PCM trigger doesn't block nor wait until the hardware > > actually stops the things, the driver may go to the other step even > > after this "supposed-to-be-stopped" point. In your case, it goes up > > to close, and crashes. If we had a sync-stop operation, the interrupt > > handler should have finished before moving to the close stage, hence > > such a race could be avoided. > > > > It's been a long known problem, and some drivers have the own > > implementation for stop-sync. I think it's time to investigate and > > start implementing the fundamental solution. > > BTW, what we need essentially for intel8x0 is to just call > synchronize_irq() before closing, at best in hw_free procedure: > > --- a/sound/pci/intel8x0.c > +++ b/sound/pci/intel8x0.c > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream, > > static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream) > { > + struct intel8x0 *chip = snd_pcm_substream_chip(substream); > struct ichdev *ichdev = get_ichdev(substream); > > + synchronize_irq(chip->irq); > if (ichdev->pcm_open_flag) { > snd_ac97_pcm_close(ichdev->pcm); > ichdev->pcm_open_flag = 0; > > > The same would be needed also at the beginning of the prepare, as the > application may restart the stream without release. > > My idea is to add sync_stop PCM ops and call it from PCM core at > snd_pcm_prepare() and snd_pcm_hw_free(). > Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue? Is it possible to have sequence like the following steps ? - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq() - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes into the lock region of snd_pcm_period_elapsed() and passes the PCM_RUNTIME_CHECK (right before snd_pcm_running()) - [Thread 1] snd_pcm_hw_free finished() -> snd_pcm_detach_substream() -> runtime=NULL - [Thread 2] Execute snd_pcm_running and crash I can't trigger the issue after adding the synchronize_irq(), but maybe it's just luck. Correct my if I miss something. Thanks, Paul > > thanks, > > Takashi _______________________________________________ Alsa-devel mailing list Alsa-devel@xxxxxxxxxxxxxxxx https://mailman.alsa-project.org/mailman/listinfo/alsa-devel