Re: [PATCH] bus: mhi: pm: Change mhi_pm_resume timeout value

Loic Poulain <loic.poulain@xxxxxxxxxx> · Fri, 5 Mar 2021 16:34:48 +0100

On Fri, 5 Mar 2021 at 16:09, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
>
> On 3/5/2021 8:08 AM, Loic Poulain wrote:
> > Hi Jeffrey,
> >
> > On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
> >>
> >> On 3/5/2021 7:09 AM, Loic Poulain wrote:
> >>> mhi_cntrl->timeout_ms is set by the controller and indicates the
> >>> maximum amount of time the controller device will take to be ready.
> >>> In case of PCI modems, this value is quite high given modems can take
> >>> up to 15 seconds from cold boot to be ready.
> >>>
> >>> Reusing this value in mhi_pm_resume can cause huge resuming latency
> >>> and delay the whole system resume (in case of system wide suspend/
> >>> resume), leading to bad use experience.
> >>
> >> I think this needs more explanation.  The timeout is a maximum value.
> >> You indicate that 2 seconds is more than enough for any MHI device to
> >> exit M3 (citation needed), but 15 seconds is too much?  The difference
> >> should only be apparent when the device doesn't transition in the timeout.
> >>
> >> Put another way, this doesn't say why 15 seconds is bad, if every device
> >> only needs 2, given that wait_event_timeout() doesn't always wait for
> >> the entire timeout value if the event occurs earlier.
> >
> > Yes, right that deserves an explanation: depending on the platform and
> > the suspend type (deep, s2idle), the PCI device may or may not lose
> > power. In case power is maintained, there is no problem and the
> > controller is successfully moved to M0. But in case of power loss, the
> > device is going to restart, and MHI resuming is going to timeout and
> > fail since M0 will never be reached. On PCI side we simply
> > reinitialize the controller in case of resume failure. So in other
> > words, MHI resume is expected to fail in some cases and it should be
> > handled with minimal impact on the system.
>
> Can we detect the power loss in far less than 2 seconds, and abort the
> resume process?  Waiting for the entire timeout, regardless of the
> value, in the power loss scenario you describe seems less than ideal for
> the system impact you are attempting to optimize.

That's a good question, like checking the state is M3 before trying
anything, need to check that.

Regards,
Loic