On 05-06-20, 10:42, Jassi Brar wrote: > Since origin upto scmi_xfer, there can be many forms of sleep like > schedule/mutexlock etc.... think of some userspace triggering sensor > or dvfs operation. Linux does not provide real-time guarantees. Even > if remote (scmi) firmware guarantee RT response, it makes sense to > timeout a response only after the _request is on the bus_ and not > when you submit a request to the api (unless you serialise it). > IOW, start the timeout from mbox_client.tx_prepare() when the > message actually gets on the bus. There are multiple purposes of the timeout IMO: - Returning early if the other side is dead/hung, in such a case the timeout can be put when the request is put on the bus as we don't care of the time it takes to complete the request until the time the request can be fulfilled. This can be a example of i2c/spi memory read. - Ensuring maximum time in which the request needs to be serviced. There may be hard requirements, like in case for DVFS from scheduler's hot path (which is essential for better working of the overall system). And for such a case the timeout is placed at the right place IMO, i.e. right after a request is submitted to mailbox. And some more points I wanted to share.. - I am not sure I understood the *serializing* part you guys were talking about. I believe mailbox framework is already serializing the requests it is receiving on a single channel with a spin lock, right ? Why does the client need to serialize them as well? Is that for avoiding timeouts ? - For me, and Sudeep as well IIUC, the bigger problem isn't that timeouts are happening and requests are failing (and so changing the timeout to a bigger value isn't going to fix anything), but the problem is that it is taking too long (because of the queue of requests on a channel) for a request to finish after being submitted. Scheduler doesn't care of the underneath logistics for example, all it cares for is the time it takes to change the frequency of a CPU. If you can do it fast enough in a guaranteed manner, then you can use fast switching, otherwise not. - The hardware can very well support the case today where this can be done in parallel and (almost) in a guaranteed time-frame. While the software wants to add a limit to that and so wants to serialize requests. - As many people have already suggested it (like me, Sudeep, Rob, maybe Bjorn as well), it seems silly to not allow driving the h/w in the most efficient way possible (and allow fast cpu switching in this case). > Interesting logs ! The time taken to complete _successful_ requests > are arguably better in bad_trace ... there are many <10usec responses > in bad_trace, while the fastest response in good_trace is 53usec. Indeed this is interesting. It may be worth looking (separately) into why don't we see those 3 us long requests anymore, or maybe they were just not there in the logs. > And the requests that 'fail/timeout' are purely the result of not > serialising them or checkout for timeout at wrong place as explained > above. We can't allow for the requests to go on for ever in some cases, while in other cases it may be absolutely fine. -- viresh