Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE

Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx> · Wed, 6 Oct 2021 14:47:33 -0500

> I tested this further. It appears that things work fine till 'patch 3/5'
> of yours. After I take 'patch 4/5', the print "BUG: scheduling while
> atomic: aplay" started showing up and I see a hang. This failure was
> seen for 2x1 mixer test itself.
> 
> The 'patch 4/5' introduces mutex_lock/unlock() in dpcm_be_dai_trigger().
> This seems to be the problem, since trigger() runs in atomic context
> depending on the PCM 'nonatomic' flag. I am not sure if your design sets
> 'nonatomic' flag by default and that is why the issue is not seen at
> your end?
> 
> With below (just for testing purpose), tests ran well. I was able to run
> 2x1, 3x1 ... 10x1 mixer tests.

This is useful information, thanks a lot Sameer for your time.

Unfortunately removing the lines below will not work, that's
precisely what's needed to prevent multiple triggers from being sent to
the same shared BE.

I don't have a clear idea why we see different results, and now I have
even less understanding of the ALSA/ASoC/DPCM locking model. We use
card->mutex, card->pcm_mutex, card->dpcm_lock,
substream->self_group.mutex or lock, client_mutex, dapm_mutex....

I don't think any of the DPCM code is ever called from an interrupt
context, so the errors you reported seem more like a card configuration,
specifically the interaction with 'aplay'/userspace will happen for FEs.
BEs are mostly hidden to userspace.

One possible difference is that all our DPCM solutions are based on
non-atomic FEs (since they all involve an IPC with a DSP), so we would
always use the top banch of these tests:

	if (nonatomic) \
		mutex_ ## mutex_action(&group->mutex); \
	else \
		spin_ ## action(&group->lock);

I don't see this nonatomic flag set anywhere in the sound/soc/tegra
code, so it's possible that in your case the spin_lock/spin_lock_irq is
used before triggering the FE, and using a mutex before the BE trigger
throws errors? I think Takashi mentioned that the BEs inherit the
properties of the FE to some extent.

Looking at the code, I see that only Intel legacy, SOF and Qualcomm
drivers set nonatomic=1, so maybe we can assume that that FEs for a card
are either all atomic or all non-atomic, maybe we could then use a
spinlock or a mutex. But if the FEs can be different then I am not sure
what locking model might work, we can't have a BE protected by a
spinlock or a mutex depending on the property of the FE. We need one
method only to guarantee a mutual exclusion.

Takashi, Mark, do you think that an all/none assumption on FE nonatomic
properties would make sense?