Re: [RFC] dt-bindings: mailbox: add doorbell support to ARM MHU

Jassi Brar <jassisinghbrar@xxxxxxxxx> · Fri, 5 Jun 2020 10:42:54 -0500

On Fri, Jun 5, 2020 at 3:58 AM Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
>
> > > > >>     bash-1526  [000]  1149.472553: scmi_xfer_begin:      transfer_id=1538 msg_id=6 protocol_id=21 seq=0 poll=0
> > > > >>      <idle>-0     [001]  1149.472733: scmi_xfer_begin:      transfer_id=1539 msg_id=7 protocol_id=19 seq=1 poll=1
> > > > >
> > > > Here another request is started before the first is finished.
> > >
> > > Ah, the prints are when the client requested. It is not when the mailbox
> > > started it. So this just indicates the beginning of the transfer from the
> > > client.
> > >
> > There maybe condition on a sensor read to finish within 1ms, but there
> > is no condition for the read to _start_ at this very moment (usually
> > there are sleeps in the path to sensor requests).
> >
>
> Again I wasn't clear. The trace logs are at the point just before calling
> mbox_send_messages. So any delay in sensor drivers won't get include. It
> is after the point sensor driver request to read the value and before we
> send the request via mailbox.
>
No, you were clear, I wasn't. Let me try again.

Since origin upto scmi_xfer, there can be many forms of sleep like
schedule/mutexlock etc.... think of some userspace triggering sensor
or dvfs operation. Linux does not provide real-time guarantees. Even
if remote (scmi) firmware guarantee RT response, it makes sense to
timeout a response only after the _request is on the bus_  and not
when you submit a request to the api (unless you serialise it).
IOW, start the timeout from  mbox_client.tx_prepare()  when the
message actually gets on the bus.

> > You have shared only 'bad' log without serialising access. Please
> > share log after serialising access to the channel and the 'good' log
> > with virtual channels.  That should put the topic to rest.
> >
>
> I didn't realise that, sorry for missing that earlier. Attached both
> now, thanks for asking.
>
Interesting logs !  The time taken to complete _successful_ requests
are arguably better in bad_trace ... there are many <10usec responses
in bad_trace, while the fastest response in good_trace is  53usec.
And the requests that 'fail/timeout' are purely the result of not
serialising them or checkout for timeout at wrong place as explained
above.

thanks.