Re: [PATCH] bus: mhi: pm: Change mhi_pm_resume timeout value

Loic Poulain <loic.poulain@xxxxxxxxxx> · Fri, 5 Mar 2021 17:16:24 +0100

On Fri, 5 Mar 2021 at 16:34, Loic Poulain <loic.poulain@xxxxxxxxxx> wrote:
>
> On Fri, 5 Mar 2021 at 16:09, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
> >
> > On 3/5/2021 8:08 AM, Loic Poulain wrote:
> > > Hi Jeffrey,
> > >
> > > On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
> > >>
> > >> On 3/5/2021 7:09 AM, Loic Poulain wrote:
> > >>> mhi_cntrl->timeout_ms is set by the controller and indicates the
> > >>> maximum amount of time the controller device will take to be ready.
> > >>> In case of PCI modems, this value is quite high given modems can take
> > >>> up to 15 seconds from cold boot to be ready.
> > >>>
> > >>> Reusing this value in mhi_pm_resume can cause huge resuming latency
> > >>> and delay the whole system resume (in case of system wide suspend/
> > >>> resume), leading to bad use experience.
> > >>
> > >> I think this needs more explanation.  The timeout is a maximum value.
> > >> You indicate that 2 seconds is more than enough for any MHI device to
> > >> exit M3 (citation needed), but 15 seconds is too much?  The difference
> > >> should only be apparent when the device doesn't transition in the timeout.
> > >>
> > >> Put another way, this doesn't say why 15 seconds is bad, if every device
> > >> only needs 2, given that wait_event_timeout() doesn't always wait for
> > >> the entire timeout value if the event occurs earlier.
> > >
> > > Yes, right that deserves an explanation: depending on the platform and
> > > the suspend type (deep, s2idle), the PCI device may or may not lose
> > > power. In case power is maintained, there is no problem and the
> > > controller is successfully moved to M0. But in case of power loss, the
> > > device is going to restart, and MHI resuming is going to timeout and
> > > fail since M0 will never be reached. On PCI side we simply
> > > reinitialize the controller in case of resume failure. So in other
> > > words, MHI resume is expected to fail in some cases and it should be
> > > handled with minimal impact on the system.
> >
> > Can we detect the power loss in far less than 2 seconds, and abort the
> > resume process?  Waiting for the entire timeout, regardless of the
> > value, in the power loss scenario you describe seems less than ideal for
> > the system impact you are attempting to optimize.
>
> That's a good question, like checking the state is M3 before trying
> anything, need to check that.

Ok, please discard this patch, I've submitted another change that
takes care of this more properly.
Thanks, Jeffrey for challenging this.

Loic