RE: [PATCH] ASoC: dpcm: fix race condition to dpcm links in dpcm_be_dai_trigger

"Gyeongtaek Lee" <gt82.lee@xxxxxxxxxxx> · Thu, 30 Sep 2021 12:48:09 +0900

>>> If routing change and underrun stop is run at the same time,
>>> data abort can be occurred by the following sequence.
>>>
>>> CPU0: Processing underrun 	CPU1: Processing routing change
>>> dpcm_be_dai_trigger():		dpcm_be_disconnect():
>>>
>>> for_each_dpcm_be(fe, stream, dpcm) {
>>>
>>> 				spin_lock_irqsave(&fe->card->dpcm_lock, flags);
>>> 				list_del(&dpcm->list_be);
>>> 				list_del(&dpcm->list_fe);
>>> 				spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
>>> 				kfree(dpcm);
>>>
>>> struct snd_soc_pcm_runtime *be = dpcm->be; <-- Accessing freed memory
>>>
>>> To prevent this situation, dpcm_lock is needed during accessing
>>> the lists for dpcm links.
>> 
>> Isn't there still a possible inconsistency here introduced by the
>> duplication of the BE list?
>> 
>> You protect the list creation, but before you use it in
>> dpcm_be_dai_trigger(), there's a time window where the function could be
>> pre-empted and a disconnect event might have happened. As a result you
>> could trigger a BE that's no longer connected.
>> 
>> What you identified as a race is likely valid, but how to fix it isn't
>> clear to me - the DPCM code isn't self-explanatory at all with its use
>> in various places of the dpcm_lock spinlock, the pcm mutex, the card mutex.
>> 
>> Ideally we would need to find a way to prevent changes in connections
>> while we are doing the triggers, but triggers can take a bit of time if
>> they involve any sort of communication over a bus. I really wonder if
>> this dpcm_lock should be a mutex and if the model for DPCM really
>> involves interrupt contexts as the irqsave/irqrestore mentions hint at.
>
>To follow-up on this, I started experimenting with a replacement of the
>'dpcm_lock' spinlock with a 'dpcm_mutex', see
>https://protect2.fireeye.com/v1/url?k=bdfd74d3-e2664dcc-bdfcff9c-000babdfecba-6f3671279e770f0b&q=1&e=7fdf074e-2aa1-44f0-bd52-58f2d26c9bfb&u=https%3A%2F%2Fgithub.com%2Fthesofproject%2Flinux%2Fpull%2F3186
>
>If we combine both of our results, the 'right' solution might be to take
>this mutex before every use of for_each_dpcm_be(), and unlock it at the
>end of the loop, which additional changes to avoid re-taking the same
>mutex in helper functions.
>
>there's still a part in DPCM that I can't figure out, there is an
>elaborate trick with an explicit comment
>
>	/* if FE's runtime_update is already set, we're in race;
>	 * process this trigger later at exit
>	 */
>
>Which looks like a missing mutex somewhere, or an overkill solution that
>might never be needed.
>
You are right.
This patch can't resolve inconsistency problem completely.
I thought that even part of the problem can be resolved by this patch and
it could help some other developers and me also.
And I also thought that invalid trigger on disconnected BE DAI can be protected
by the state check in the trigger function like the below.

int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
			       int cmd)
{
	struct snd_soc_dpcm *dpcm;
	int ret = 0;

	for_each_dpcm_be(fe, stream, dpcm) {
.......
		switch (cmd) {
		case SNDRV_PCM_TRIGGER_START:
			/* Following if statement protect invalid control. */
			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
				continue;

			ret = dpcm_do_trigger(dpcm, be_substream, cmd);

I really appreciate that there is a project about this problem already.
But if the project needs more time to be merged into the mainline,
I think that this patch can be used until the project is merged.
If you don't mind, would you reconsider this patch one more time?

Thank you,
Gyeongtaek Lee.