>>> If routing change and underrun stop is run at the same time, >>> data abort can be occurred by the following sequence. >>> >>> CPU0: Processing underrun CPU1: Processing routing change >>> dpcm_be_dai_trigger(): dpcm_be_disconnect(): >>> >>> for_each_dpcm_be(fe, stream, dpcm) { >>> >>> spin_lock_irqsave(&fe->card->dpcm_lock, flags); >>> list_del(&dpcm->list_be); >>> list_del(&dpcm->list_fe); >>> spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); >>> kfree(dpcm); >>> >>> struct snd_soc_pcm_runtime *be = dpcm->be; <-- Accessing freed memory >>> >>> To prevent this situation, dpcm_lock is needed during accessing >>> the lists for dpcm links. >> >> Isn't there still a possible inconsistency here introduced by the >> duplication of the BE list? >> >> You protect the list creation, but before you use it in >> dpcm_be_dai_trigger(), there's a time window where the function could be >> pre-empted and a disconnect event might have happened. As a result you >> could trigger a BE that's no longer connected. >> >> What you identified as a race is likely valid, but how to fix it isn't >> clear to me - the DPCM code isn't self-explanatory at all with its use >> in various places of the dpcm_lock spinlock, the pcm mutex, the card mutex. >> >> Ideally we would need to find a way to prevent changes in connections >> while we are doing the triggers, but triggers can take a bit of time if >> they involve any sort of communication over a bus. I really wonder if >> this dpcm_lock should be a mutex and if the model for DPCM really >> involves interrupt contexts as the irqsave/irqrestore mentions hint at. > >To follow-up on this, I started experimenting with a replacement of the >'dpcm_lock' spinlock with a 'dpcm_mutex', see >https://protect2.fireeye.com/v1/url?k=bdfd74d3-e2664dcc-bdfcff9c-000babdfecba-6f3671279e770f0b&q=1&e=7fdf074e-2aa1-44f0-bd52-58f2d26c9bfb&u=https%3A%2F%2Fgithub.com%2Fthesofproject%2Flinux%2Fpull%2F3186 > >If we combine both of our results, the 'right' solution might be to take >this mutex before every use of for_each_dpcm_be(), and unlock it at the >end of the loop, which additional changes to avoid re-taking the same >mutex in helper functions. > >there's still a part in DPCM that I can't figure out, there is an >elaborate trick with an explicit comment > > /* if FE's runtime_update is already set, we're in race; > * process this trigger later at exit > */ > >Which looks like a missing mutex somewhere, or an overkill solution that >might never be needed. > You are right. This patch can't resolve inconsistency problem completely. I thought that even part of the problem can be resolved by this patch and it could help some other developers and me also. And I also thought that invalid trigger on disconnected BE DAI can be protected by the state check in the trigger function like the below. int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, int cmd) { struct snd_soc_dpcm *dpcm; int ret = 0; for_each_dpcm_be(fe, stream, dpcm) { ....... switch (cmd) { case SNDRV_PCM_TRIGGER_START: /* Following if statement protect invalid control. */ if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) continue; ret = dpcm_do_trigger(dpcm, be_substream, cmd); I really appreciate that there is a project about this problem already. But if the project needs more time to be merged into the mainline, I think that this patch can be used until the project is merged. If you don't mind, would you reconsider this patch one more time? Thank you, Gyeongtaek Lee.