Dear All, On 17.12.2021 11:11, Marek Szyprowski wrote: > On 07.12.2021 18:37, Pierre-Louis Bossart wrote: >> From: Takashi Iwai <tiwai@xxxxxxx> >> >> When more than one FE is connected to a BE, e.g. in a mixing use case, >> the BE can be triggered multiple times when the FE are opened/started >> concurrently. This race condition is problematic in the case of >> SoundWire BE dailinks, and this is not desirable in a general >> case. >> >> This patch relies on the existing BE PCM lock, which takes atomicity >> into >> account. The locking model assumes that all interactions start with >> the FE, so that there is no deadlock between FE and BE locks. >> >> Signed-off-by: Takashi Iwai <tiwai@xxxxxxx> >> [test, checkpatch fix and clarification of commit message by plbossart] >> Signed-off-by: Pierre-Louis Bossart >> <pierre-louis.bossart@xxxxxxxxxxxxxxx> >> Reviewed-by: Kai Vehmanen <kai.vehmanen@xxxxxxxxxxxxxxx> >> Reviewed-by: Bard Liao <yung-chuan.liao@xxxxxxxxxxxxxxx> >> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@xxxxxxxxxxxxxxx> > > This patch recently landed in linux-next (next-20211215) as commit > b2ae80663008 ("ASoC: soc-pcm: serialize BE triggers"). It turned out that the issue is caused by the previous patch "[PATCH 3/6] ASoC: soc-pcm: Fix and cleanup DPCM locking <https://lore.kernel.org/all/20211207173745.15850-4-pierre-louis.bossart@xxxxxxxxxxxxxxx/#r>", merged as commit b7898396f4bb ("ASoC: soc-pcm: Fix and cleanup DPCM locking"), for some reasons 'git bisect' pointed me initially to the wrong patch. I will re-report it under the proper patch. Reverting it and the next patches on top of linux-next fixes the issue. > I found that it triggers a warning on my test boards. This is the one > from Exynos4412-based Odroid U3 board: > > # speaker-test -l1 > > speaker-test 1.1.8 > > Playback device is default > Stream parameters are 48000Hz, S16_LE, 1 channels > Using 16 octaves of pink noise > Rate set to 48000Hz (requested 48000Hz) > Buffer size range from 128 to 131072 > Period size range from 64 to 65536 > Using max buffer size 131072 > Periods = 4 > was set period_size = 32768 > was set buffer_size = 131072 > 0 - Front Left > > ============================================ > WARNING: possible recursive locking detected > 5.16.0-rc1-00270-gb2ae80663008 #11109 Not tainted > -------------------------------------------- > speaker-test/1312 is trying to acquire lock: > c1d78ca4 (&group->lock){....}-{2:2}, at: dpcm_be_dai_trigger+0x80/0x300 > > but task is already holding lock: > c1d788a4 (&group->lock){....}-{2:2}, at: > snd_pcm_action_lock_irq+0x68/0x7c > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&group->lock); > lock(&group->lock); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 1 lock held by speaker-test/1312: > #0: c1d788a4 (&group->lock){....}-{2:2}, at: > snd_pcm_action_lock_irq+0x68/0x7c > > stack backtrace: > CPU: 0 PID: 1312 Comm: speaker-test Not tainted > 5.16.0-rc1-00270-gb2ae80663008 #11109 > Hardware name: Samsung Exynos (Flattened Device Tree) > [<c0110b30>] (unwind_backtrace) from [<c010c618>] (show_stack+0x10/0x14) > [<c010c618>] (show_stack) from [<c0b65d24>] (dump_stack_lvl+0x58/0x70) > [<c0b65d24>] (dump_stack_lvl) from [<c0193740>] > (__lock_acquire+0x15ac/0x319c) > [<c0193740>] (__lock_acquire) from [<c0195dd8>] > (lock_acquire+0x14c/0x424) > [<c0195dd8>] (lock_acquire) from [<c0b745b8>] > (_raw_spin_lock_irqsave+0x44/0x60) > [<c0b745b8>] (_raw_spin_lock_irqsave) from [<c0926b6c>] > (dpcm_be_dai_trigger+0x80/0x300) > [<c0926b6c>] (dpcm_be_dai_trigger) from [<c0927004>] > (dpcm_fe_dai_do_trigger+0x124/0x1e4) > [<c0927004>] (dpcm_fe_dai_do_trigger) from [<c090728c>] > (snd_pcm_action+0x74/0xb0) > [<c090728c>] (snd_pcm_action) from [<c0907eac>] > (snd_pcm_action_lock_irq+0x3c/0x7c) > [<c0907eac>] (snd_pcm_action_lock_irq) from [<c02f13a0>] > (sys_ioctl+0x568/0xd44) > [<c02f13a0>] (sys_ioctl) from [<c0100060>] (ret_fast_syscall+0x0/0x1c) > Exception stack(0xc4777fa8 to 0xc4777ff0) > 7fa0: 004f5210 b6e27394 00000004 00004142 004f5398 > 004f5398 > 7fc0: 004f5210 b6e27394 00020000 00000036 00000000 00000000 bee588e8 > 00008000 > 7fe0: b6e277c4 bee58874 b6d8e888 b6c751dc > Time per period = 0.253397 > max98090 1-0010: PLL unlocked > BUG: sleeping function called from invalid context at > kernel/locking/rwsem.c:1526 > in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1312, name: > speaker-test > preempt_count: 1, expected: 0 > RCU nest depth: 0, expected: 0 > INFO: lockdep is turned off. > irq event stamp: 8158 > hardirqs last enabled at (8157): [<c0b747d0>] > _raw_spin_unlock_irqrestore+0x5c/0x60 > hardirqs last disabled at (8158): [<c0b74570>] > _raw_spin_lock_irq+0x58/0x5c > softirqs last enabled at (7854): [<c0101578>] __do_softirq+0x348/0x610 > softirqs last disabled at (7849): [<c012e7a4>] __irq_exit_rcu+0x144/0x1ec > Preemption disabled at: > [<00000000>] 0x0 > CPU: 0 PID: 1312 Comm: speaker-test Not tainted > 5.16.0-rc1-00270-gb2ae80663008 #11109 > Hardware name: Samsung Exynos (Flattened Device Tree) > [<c0110b30>] (unwind_backtrace) from [<c010c618>] (show_stack+0x10/0x14) > [<c010c618>] (show_stack) from [<c0b65d24>] (dump_stack_lvl+0x58/0x70) > [<c0b65d24>] (dump_stack_lvl) from [<c0158b04>] > (__might_resched+0x1c0/0x288) > [<c0158b04>] (__might_resched) from [<c0b71898>] (down_write+0x24/0x8c) > [<c0b71898>] (down_write) from [<c030ed64>] > (simple_recursive_removal+0x6c/0x370) > [<c030ed64>] (simple_recursive_removal) from [<c04d07a4>] > (debugfs_remove+0x38/0x4c) > [<c04d07a4>] (debugfs_remove) from [<c0928784>] > (dpcm_be_disconnect+0x160/0x2c4) > [<c0928784>] (dpcm_be_disconnect) from [<c092895c>] > (dpcm_fe_dai_cleanup+0x74/0xb0) > [<c092895c>] (dpcm_fe_dai_cleanup) from [<c0928d90>] > (dpcm_fe_dai_close+0xe8/0x14c) > [<c0928d90>] (dpcm_fe_dai_close) from [<c090977c>] > (snd_pcm_release_substream.part.0+0x3c/0xcc) > [<c090977c>] (snd_pcm_release_substream.part.0) from [<c0909878>] > (snd_pcm_release+0x54/0xa4) > [<c0909878>] (snd_pcm_release) from [<c02dc400>] (__fput+0x88/0x258) > [<c02dc400>] (__fput) from [<c014cd44>] (task_work_run+0x8c/0xc8) > [<c014cd44>] (task_work_run) from [<c010c08c>] > (do_work_pending+0x4a4/0x598) > [<c010c08c>] (do_work_pending) from [<c0100088>] > (slow_work_pending+0xc/0x20) > Exception stack(0xc4777fb0 to 0xc4777ff8) > 7fa0: 00000000 004f5260 004eaa9c > 00000000 > 7fc0: 004f5260 004f536c 004f5210 00000006 004fb700 004e6e8c 004d6120 > bee58cc4 > 7fe0: b6e27e64 bee58928 b6d8eda4 b6d09ac0 60000050 00000004 > > Let me know how I can help debugging this issue. > >> --- >> sound/soc/soc-pcm.c | 46 ++++++++++++++++++++++++++++----------------- >> 1 file changed, 29 insertions(+), 17 deletions(-) >> >> diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c >> index 2e282c42bac2..7043857e30b1 100644 >> --- a/sound/soc/soc-pcm.c >> +++ b/sound/soc/soc-pcm.c >> @@ -46,12 +46,18 @@ static inline void >> snd_soc_dpcm_stream_lock_irq(struct snd_soc_pcm_runtime *rtd, >> snd_pcm_stream_lock_irq(snd_soc_dpcm_get_substream(rtd, stream)); >> } >> +#define snd_soc_dpcm_stream_lock_irqsave(rtd, stream, flags) \ >> + snd_pcm_stream_lock_irqsave(snd_soc_dpcm_get_substream(rtd, >> stream), flags) >> + >> static inline void snd_soc_dpcm_stream_unlock_irq(struct >> snd_soc_pcm_runtime *rtd, >> int stream) >> { >> snd_pcm_stream_unlock_irq(snd_soc_dpcm_get_substream(rtd, >> stream)); >> } >> +#define snd_soc_dpcm_stream_unlock_irqrestore(rtd, stream, flags) \ >> + snd_pcm_stream_unlock_irqrestore(snd_soc_dpcm_get_substream(rtd, >> stream), flags) >> + >> #define DPCM_MAX_BE_USERS 8 >> static inline const char *soc_cpu_dai_name(struct >> snd_soc_pcm_runtime *rtd) >> @@ -2079,6 +2085,7 @@ int dpcm_be_dai_trigger(struct >> snd_soc_pcm_runtime *fe, int stream, >> { >> struct snd_soc_pcm_runtime *be; >> struct snd_soc_dpcm *dpcm; >> + unsigned long flags; >> int ret = 0; >> for_each_dpcm_be(fe, stream, dpcm) { >> @@ -2087,9 +2094,11 @@ int dpcm_be_dai_trigger(struct >> snd_soc_pcm_runtime *fe, int stream, >> be = dpcm->be; >> be_substream = snd_soc_dpcm_get_substream(be, stream); >> + snd_soc_dpcm_stream_lock_irqsave(be, stream, flags); >> + >> /* is this op for this BE ? */ >> if (!snd_soc_dpcm_be_can_update(fe, be, stream)) >> - continue; >> + goto next; >> dev_dbg(be->dev, "ASoC: trigger BE %s cmd %d\n", >> be->dai_link->name, cmd); >> @@ -2099,77 +2108,80 @@ int dpcm_be_dai_trigger(struct >> snd_soc_pcm_runtime *fe, int stream, >> if ((be->dpcm[stream].state != >> SND_SOC_DPCM_STATE_PREPARE) && >> (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) && >> (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) >> - continue; >> + goto next; >> ret = soc_pcm_trigger(be_substream, cmd); >> if (ret) >> - goto end; >> + goto next; >> be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; >> break; >> case SNDRV_PCM_TRIGGER_RESUME: >> if ((be->dpcm[stream].state != >> SND_SOC_DPCM_STATE_SUSPEND)) >> - continue; >> + goto next; >> ret = soc_pcm_trigger(be_substream, cmd); >> if (ret) >> - goto end; >> + goto next; >> be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; >> break; >> case SNDRV_PCM_TRIGGER_PAUSE_RELEASE: >> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) >> - continue; >> + goto next; >> ret = soc_pcm_trigger(be_substream, cmd); >> if (ret) >> - goto end; >> + goto next; >> be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; >> break; >> case SNDRV_PCM_TRIGGER_STOP: >> if ((be->dpcm[stream].state != >> SND_SOC_DPCM_STATE_START) && >> (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) >> - continue; >> + goto next; >> if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) >> - continue; >> + goto next; >> ret = soc_pcm_trigger(be_substream, cmd); >> if (ret) >> - goto end; >> + goto next; >> be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP; >> break; >> case SNDRV_PCM_TRIGGER_SUSPEND: >> if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) >> - continue; >> + goto next; >> if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) >> - continue; >> + goto next; >> ret = soc_pcm_trigger(be_substream, cmd); >> if (ret) >> - goto end; >> + goto next; >> be->dpcm[stream].state = SND_SOC_DPCM_STATE_SUSPEND; >> break; >> case SNDRV_PCM_TRIGGER_PAUSE_PUSH: >> if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) >> - continue; >> + goto next; >> if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) >> - continue; >> + goto next; >> ret = soc_pcm_trigger(be_substream, cmd); >> if (ret) >> - goto end; >> + goto next; >> be->dpcm[stream].state = SND_SOC_DPCM_STATE_PAUSED; >> break; >> } >> +next: >> + snd_soc_dpcm_stream_unlock_irqrestore(be, stream, flags); >> + if (ret) >> + break; >> } >> -end: >> if (ret < 0) >> dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n", >> __func__, be->dai_link->name, ret); > > Best regards Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland