Re: [Sound-open-firmware] ASoC: SOF: Race condition in ipc.c

Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx> · Tue, 21 Jun 2022 08:11:48 -0500

On 6/20/22 21:46, noman pouigt wrote:
> Folks,
> 
> I have borrowed part of SOF architecture for my own DSP
> framework development as the memory on the DSP is
> extremely small and wouldn't be able to support SOF.

You're providing very little context here. Of course it's open-source
and you can do whatever you want with the code, but it's not clear if
the 'borrowed' code can deal with your specific case.

> Currently I am running into a race condition as below:
> 
> CPU                                            DSP
> PCM_TRIGGER_START
> sof_ipc_send_msg ---->
> 
>                                       <------Immediate ACK
> ipc3_wait_tx_done
> (wait_event_timeout)
>                                       <------ POSITION update
> 
> snd_pcm_period_elapsed
> 
> 
> As you can see TRIGGER_START didn't even finish
> and waiting for it to complete in ipc3_wait_tx_done
> function. However, before it could complete the position
> interrupt was issued which results in calling period_elapsed
> function.
> 
> In order to fix this I assume below is called in SOF framework:
> schedule_work(&spcm->stream[substream->stream].period_elapsed_work);
> 
> How is this design working? If the interrupt is coming too fast
> from the DSP than the associated function with this schedule_work
> will not get called as the scheduler will not get time to schedule the
> workqueue and elapsed function will not be called thereby not increasing
> the hw_ptr. How is the flow control for data transfer achieved?

The schedule_work was added by this commit, and the explanations are
rather straightforward:

commit e2803e610aecb36ea4fec5a04861547664580d0c

Author: Keyon Jie <yang.jie@xxxxxxxxxxxxxxx>

Date:   Tue Apr 30 18:09:25 2019 -0500

ASoC: SOF: PCM: add period_elapsed work to fix race condition in
interrupt context

"
The IPC implementation in SOF requires sending IPCs serially: we should
not send a new IPC command to the firmware before we get an ACK (or time
out) from firmware, and the IRQ processing is complete.

snd_pcm_period_elapsed() can be called in interrupt context before
IRQ_HANDLED is returned. When the PCM is done draining, a STOP
IPC will then be sent, which breaks the expectation that IPCs are
handled serially and leads to IPC timeouts.

This patch adds a workqueue to defer the call to snd_pcm_elapsed() after
the IRQ is handled.
"

I am not sure what the problem you have really is.

It's not really surprising that the first period is consumed
immediately, the flow will become more steady-state afterwards.

The DMAs should be primed to deal with more than one period, and the
schedule_work() will only signal that the application can refill the
ring buffer. There's all kinds of delays that will happen depending on
load and scheduling, and if the POSITION_UPDATE was received immediately
then there's should be still plenty of time to provide a new buffer.

> 
> I am facing the above problem in my design.
> 
> I am wondering if I can simply add one more IPC command(don't call
> wait_event_interruptible for this) after TRIGGER_START to start the
> streaming.This way schedule_work for updating period_elapsed function
> can be avoided and it can be called in an interrupt context.

See commit above, this won't work because you'll be sending an IPC while
dealing with an IPC.