On Mon, 18 Sep 2023, Hans de Goede wrote: > On 9/15/23 15:49, Ilpo Järvinen wrote: > > On Wed, 13 Sep 2023, Stephen Boyd wrote: > > > >> It's possible for the completion in ipc_wait_for_interrupt() to timeout, > >> simply because the interrupt was delayed in being processed. A timeout > >> in itself is not an error. This driver should check the status register > >> upon a timeout to ensure that scheduling or interrupt processing delays > >> don't affect the outcome of the IPC return value. > >> > >> CPU0 SCU > >> ---- --- > >> ipc_wait_for_interrupt() > >> wait_for_completion_timeout(&scu->cmd_complete) > >> [TIMEOUT] status[IPC_STATUS_BUSY]=0 > >> > >> Fix this problem by reading the status bit in all cases, regardless of > >> the timeout. If the completion times out, we'll assume the problem was > >> that the IPC_STATUS_BUSY bit was still set, but if the status bit is > >> cleared in the meantime we know that we hit some scheduling delay and we > >> should just check the error bit. > > > > Hi, > > > > I don't understand the intent here. What prevents IPC_STATUS_BUSY from > > changing right after you've read it in ipc_read_status(scu)? Doesn't that > > end you exactly into the same situation where the returned value is stale > > so I cannot see how this fixes anything, at best it just plays around the > > race window that seems to still be there after this fix? > > As I understand it the problem before was that the function would > return -ETIMEDOUT; purely based on wait_for_completion_timeout() > without ever actually checking the BUSY bit: > > Old code: > > if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT)) > return -ETIMEDOUT; > > This allows for a scenario where when the IRQ processing got delayed > (on say another core) causing the timeout to trigger, > ipc_wait_for_interrupt() would return -ETIMEDOUT even though > the BUSY flag was already cleared by the SCU. > > This patch adds an explicit check for the BUSY flag after > the wait_for_completion(), rather then relying on the > wait_for_completion() return value which implies things > are still busy. Oh, I see, it's because the code is waiting for the completion rather than the actual condition. > As for "What prevents IPC_STATUS_BUSY from > changing right after you've read it in ipc_read_status(scu)?" > > AFAICT in this code path the bit is only ever supposed to go > from being set (busy) to unset (not busy), not the other > way around since no new commands can be submitted until > this function has completed. So that scenario cannot happen. This is not what I meant. I meant that if the code has decided to return -ETIMEDOUT, the status bit still change at that point which makes the return value to not match. This race is still there and given the changelog was a bit sparse on what race it was fixing I ended up noticing this detail. -- i.