Re: [PATCH v5 06/10] mhi: pci_generic: Add suspend/resume/recovery procedure

Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> · Wed, 23 Dec 2020 15:11:10 +0530

On Wed, Dec 23, 2020 at 09:25:37AM +0100, Loic Poulain wrote:
> Hi Mani,
> 
> On Tue, 22 Dec 2020 at 18:05, Manivannan Sadhasivam
> <manivannan.sadhasivam@xxxxxxxxxx> wrote:
> >

[...]

> > > +
> > > +     /* Check if we can recover without full reset */
> > > +     pci_set_power_state(pdev, PCI_D0);
> > > +     pci_load_saved_state(pdev, mhi_pdev->pci_state);
> > > +     pci_restore_state(pdev);
> >
> > These pci state settings seems redundant with resume().
> >
> > In this function you should first check if MHI is alive, if yes then do
> > power up. Else you should just exit.
> 
> Recovery is not only executed on a resume but also when a crash or
> reboot is detected, that why we need to restore PCI state here.
> Moreover, contrary to resume, the restored PCI state is not the one
> saved in suspend, but the known working (and saved) initial pci state
> (mhi_pdev->pci_state).
> 

Ah I missed it!

> >
> > > +
> > > +     if (!mhi_pci_is_alive(mhi_cntrl))
> > > +             goto err_try_reset;
> > > +
> > > +     err = mhi_prepare_for_power_up(mhi_cntrl);
> > > +     if (err)
> > > +             goto err_try_reset;
> > > +
> > > +     err = mhi_sync_power_up(mhi_cntrl);
> > > +     if (err)
> > > +             goto err_unprepare;
> >
> > Add a debug log for recovery success.
> 
> Yes, will do.
> 
> >
> > > +
> > > +     set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
> > > +     return;
> > > +
> > > +err_unprepare:
> > > +     mhi_unprepare_after_power_down(mhi_cntrl);
> > > +err_try_reset:
> > > +     if (pci_reset_function(pdev))
> >
> > So if the device recovers, who will make sure reinitializing the MHI
> > controller? That's why I think we should convey the recovery result to
> > PM core. I don't think using workqueue here is a good idea.
> 
> The mhi controller is reinitialized in the recovery work itself.
> Recovery can be a long process, and play with device
> registering/deregistering. We can not do that synchronously in the
> system resume path since it causes unexpected resume latency (this is
> actually no more a resume but a complete reset), moving it
> synchronously in resume cause hang on my side. However I agree that
> the PM core should be informed about the resume failure, so instead of
> unconditionally returning success in the resume callback I'm going to
> forward the error to PM core (and trigger recovery in parallel).
> 

okay.

> >
> > > +             dev_err(&pdev->dev, "Recovery failed\n");
> > > +}
> > > +
> > >  static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  {
> > >       const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data;
> > > @@ -327,6 +371,8 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > >       if (!mhi_pdev)
> > >               return -ENOMEM;
> > >
> > > +     INIT_WORK(&mhi_pdev->recovery_work, mhi_pci_recovery_work);
> > > +
> > >       mhi_cntrl_config = info->config;
> > >       mhi_cntrl = &mhi_pdev->mhi_cntrl;
> > >
> > > @@ -391,6 +437,8 @@ static void mhi_pci_remove(struct pci_dev *pdev)
> > >       struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
> > >       struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> > >
> > > +     cancel_work_sync(&mhi_pdev->recovery_work);
> > > +
> > >       if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
> > >               mhi_power_down(mhi_cntrl, true);
> > >               mhi_unprepare_after_power_down(mhi_cntrl);
> > > @@ -456,12 +504,66 @@ static const struct pci_error_handlers mhi_pci_err_handler = {
> > >       .reset_done = mhi_pci_reset_done,
> > >  };
> > >
> > > +static int  __maybe_unused mhi_pci_suspend(struct device *dev)
> > > +{
> > > +     struct pci_dev *pdev = to_pci_dev(dev);
> > > +     struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
> > > +     struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> > > +
> > > +     cancel_work_sync(&mhi_pdev->recovery_work);
> > > +
> > > +     /* Transition to M3 state */
> > > +     mhi_pm_suspend(mhi_cntrl);
> > > +
> > > +     pci_save_state(pdev);
> > > +     pci_disable_device(pdev);
> > > +     pci_wake_from_d3(pdev, true);
> > > +     pci_set_power_state(pdev, PCI_D3hot);
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static int __maybe_unused mhi_pci_resume(struct device *dev)
> > > +{
> > > +     struct pci_dev *pdev = to_pci_dev(dev);
> > > +     struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
> > > +     struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> > > +     int err;
> > > +
> > > +     pci_set_power_state(pdev, PCI_D0);
> > > +     pci_restore_state(pdev);
> > > +     pci_set_master(pdev);
> > > +
> > > +     err = pci_enable_device(pdev);
> > > +     if (err)
> > > +             goto err_recovery;
> > > +
> > > +     /* Exit M3, transition to M0 state */
> > > +     err = mhi_pm_resume(mhi_cntrl);
> > > +     if (err) {
> > > +             dev_err(&pdev->dev, "failed to resume device: %d\n", err);
> > > +             goto err_recovery;
> > > +     }
> > > +
> > > +     return 0;
> > > +
> > > +err_recovery:
> > > +     /* The device may have loose power or crashed, try recovering it */
> >
> > Did you actually hit this scenario? In the case of power loss or crash, can we
> > access the MHI register space?
> 
> Yes I hit this scenario on my computer since PCI power is not
> maintained, mhi_pm_resume behaves correctly whether the MHI register
> space is available or not since it will hit and return an error moving
> to M0 state:
> mhi mhi0: Did not enter M0 state, MHI state: M3, PM state: M3->M0
> 

Okay. As long as you are returning the error code to PM core I'm fine.

Thanks,
Mani

> 
> Regards,
> Loic