>> If routing change and underrun stop is run at the same time, >> data abort can be occurred by the following sequence. >> >> CPU0: Processing underrun CPU1: Processing routing change >> dpcm_be_dai_trigger(): dpcm_be_disconnect(): >> >> for_each_dpcm_be(fe, stream, dpcm) { >> >> spin_lock_irqsave(&fe->card->dpcm_lock, flags); >> list_del(&dpcm->list_be); >> list_del(&dpcm->list_fe); >> spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); >> kfree(dpcm); >> >> struct snd_soc_pcm_runtime *be = dpcm->be; <-- Accessing freed memory >> >> To prevent this situation, dpcm_lock is needed during accessing >> the lists for dpcm links. > > Isn't there still a possible inconsistency here introduced by the > duplication of the BE list? > > You protect the list creation, but before you use it in > dpcm_be_dai_trigger(), there's a time window where the function could be > pre-empted and a disconnect event might have happened. As a result you > could trigger a BE that's no longer connected. > > What you identified as a race is likely valid, but how to fix it isn't > clear to me - the DPCM code isn't self-explanatory at all with its use > in various places of the dpcm_lock spinlock, the pcm mutex, the card mutex. > > Ideally we would need to find a way to prevent changes in connections > while we are doing the triggers, but triggers can take a bit of time if > they involve any sort of communication over a bus. I really wonder if > this dpcm_lock should be a mutex and if the model for DPCM really > involves interrupt contexts as the irqsave/irqrestore mentions hint at. To follow-up on this, I started experimenting with a replacement of the 'dpcm_lock' spinlock with a 'dpcm_mutex', see https://github.com/thesofproject/linux/pull/3186 If we combine both of our results, the 'right' solution might be to take this mutex before every use of for_each_dpcm_be(), and unlock it at the end of the loop, which additional changes to avoid re-taking the same mutex in helper functions. there's still a part in DPCM that I can't figure out, there is an elaborate trick with an explicit comment /* if FE's runtime_update is already set, we're in race; * process this trigger later at exit */ Which looks like a missing mutex somewhere, or an overkill solution that might never be needed.