Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()

Cristian Marussi <cristian.marussi@xxxxxxx> · Tue, 11 Oct 2022 15:14:27 +0100

On Tue, Oct 11, 2022 at 03:46:15PM +0800, Yaxiong Tian wrote:
> >Note that there could be also the case in which there are many other
> >threads of executions on different cores trying to send a bunch of SCMI
> >commands (there are indeed many SCMI drivers currently in the stack) so
> >all these requests HAVE TO be queued while waiting for the shared
> >memory area to be free again, so I don't think you can assume in general
> >that shmem_tx_prepare() is called only after a successfull command completion
> >or timeout: it is probably true when an SCMI Mailbox transport is
> >used but it is not neccessarly the case when other shmem based transports
> >like optee/smc are used. You could have N threads trying to send an SCMI
> >command, queued, spinning on that loop, while waiting for the channel to
> >be free: in such a case note that you TX has not even complete, you are
> >still waiting for the channel the SCMI upper layer TX timeout has NOT
> >even being started since your the transmission itself is pending (i.e.
> >send_message has still not returned...)
> 
> I read the code in optee/smc, scmi_optee_send_message()/smc_send_message()
> have channel lock before shmem_tx_prepare(). The lock was released when 
> transports was completed 、timeout or err. So although they have N threads 
> trying to send an SCMI command, queued, but only one could take up the channel.
> Also only one thread call shmem_tx_prepare() in one channel and spin() in it.
> 
> It is also true in mailboxs or other shmem based transports,because SCMI 
> protocol say:"agent must exclusive the channel."This is very reasonable through 
> the channel lock rather than shared memory.
> 
> So it is simple for selecting timeout period. Because there is only one thread 
> spin() in one channel, so the timeout period only depending on the firmware. 
> In other words this time can be a constant. 
> 

Yes, my bad, I forgot that any shared-mem based transport has some kind
of mechanism to access exclusively the channel for the whole transaction
to avoid some thread can issue a tx_prepare before the previous
transaction has fully completed (i.e. the result in the reply, if any,
was fetched before being overwritten by the next)

> >Not sure that all of this kind of work would be worth to address some,
> >maybe transient, error conditions due to a broken SCMI server, BUT in any
> >case, any kind of timeout you want to introduce in the spin loop MUST
> >result in a failed transmission until the FREE bitflag is cleared by the
> >SCMI server; i.e. if that flag won't be cleared EVER by the server, you
> >have to end up with a sequence of timed-out spinloops and transmission
> >failures, you definetely cannot recover forcibly like this.
> 
> I totally agree above.In such broken SCMI server,users cannot get any Any 
> hints.So I think it at least pr_warn(). We can set prepare_tx_timout parameter 
> in scmi_desc,or just set options for users to check error.
>

Problem is anyway, as you said, you'll have to pick this timeout from the
related transport scmi_desc (even if as of now the max_rx_timeout for
all existent shared mem transport is the same..) and this means anyway
adding more complexity to the chain of calls to just to print a warn of
some kind in a rare error-situation from which you cannot recover anyway.

Due to other unrelated discussions, I was starting to think about
exposing some debug-only (Kconfig dependent) SCMI stats like timeouts, errors,
unpexpected/OoO/late_replies in order to ease the debug and monitoring
of the health of a running SCMI stack: maybe this could be a place where
to flag this FW issues without changing the spinloop above (or
to add the kind of timeout you mentioned but only when some sort of
CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, though.

Any thoughts ?

Thanks,
Cristian