On Wed, Dec 23, 2020 at 09:25:37AM +0100, Loic Poulain wrote: > Hi Mani, > > On Tue, 22 Dec 2020 at 18:05, Manivannan Sadhasivam > <manivannan.sadhasivam@xxxxxxxxxx> wrote: > > [...] > > > + > > > + /* Check if we can recover without full reset */ > > > + pci_set_power_state(pdev, PCI_D0); > > > + pci_load_saved_state(pdev, mhi_pdev->pci_state); > > > + pci_restore_state(pdev); > > > > These pci state settings seems redundant with resume(). > > > > In this function you should first check if MHI is alive, if yes then do > > power up. Else you should just exit. > > Recovery is not only executed on a resume but also when a crash or > reboot is detected, that why we need to restore PCI state here. > Moreover, contrary to resume, the restored PCI state is not the one > saved in suspend, but the known working (and saved) initial pci state > (mhi_pdev->pci_state). > Ah I missed it! > > > > > + > > > + if (!mhi_pci_is_alive(mhi_cntrl)) > > > + goto err_try_reset; > > > + > > > + err = mhi_prepare_for_power_up(mhi_cntrl); > > > + if (err) > > > + goto err_try_reset; > > > + > > > + err = mhi_sync_power_up(mhi_cntrl); > > > + if (err) > > > + goto err_unprepare; > > > > Add a debug log for recovery success. > > Yes, will do. > > > > > > + > > > + set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status); > > > + return; > > > + > > > +err_unprepare: > > > + mhi_unprepare_after_power_down(mhi_cntrl); > > > +err_try_reset: > > > + if (pci_reset_function(pdev)) > > > > So if the device recovers, who will make sure reinitializing the MHI > > controller? That's why I think we should convey the recovery result to > > PM core. I don't think using workqueue here is a good idea. > > The mhi controller is reinitialized in the recovery work itself. > Recovery can be a long process, and play with device > registering/deregistering. We can not do that synchronously in the > system resume path since it causes unexpected resume latency (this is > actually no more a resume but a complete reset), moving it > synchronously in resume cause hang on my side. However I agree that > the PM core should be informed about the resume failure, so instead of > unconditionally returning success in the resume callback I'm going to > forward the error to PM core (and trigger recovery in parallel). > okay. > > > > > + dev_err(&pdev->dev, "Recovery failed\n"); > > > +} > > > + > > > static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > > > { > > > const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data; > > > @@ -327,6 +371,8 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > > > if (!mhi_pdev) > > > return -ENOMEM; > > > > > > + INIT_WORK(&mhi_pdev->recovery_work, mhi_pci_recovery_work); > > > + > > > mhi_cntrl_config = info->config; > > > mhi_cntrl = &mhi_pdev->mhi_cntrl; > > > > > > @@ -391,6 +437,8 @@ static void mhi_pci_remove(struct pci_dev *pdev) > > > struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev); > > > struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl; > > > > > > + cancel_work_sync(&mhi_pdev->recovery_work); > > > + > > > if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) { > > > mhi_power_down(mhi_cntrl, true); > > > mhi_unprepare_after_power_down(mhi_cntrl); > > > @@ -456,12 +504,66 @@ static const struct pci_error_handlers mhi_pci_err_handler = { > > > .reset_done = mhi_pci_reset_done, > > > }; > > > > > > +static int __maybe_unused mhi_pci_suspend(struct device *dev) > > > +{ > > > + struct pci_dev *pdev = to_pci_dev(dev); > > > + struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev); > > > + struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl; > > > + > > > + cancel_work_sync(&mhi_pdev->recovery_work); > > > + > > > + /* Transition to M3 state */ > > > + mhi_pm_suspend(mhi_cntrl); > > > + > > > + pci_save_state(pdev); > > > + pci_disable_device(pdev); > > > + pci_wake_from_d3(pdev, true); > > > + pci_set_power_state(pdev, PCI_D3hot); > > > + > > > + return 0; > > > +} > > > + > > > +static int __maybe_unused mhi_pci_resume(struct device *dev) > > > +{ > > > + struct pci_dev *pdev = to_pci_dev(dev); > > > + struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev); > > > + struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl; > > > + int err; > > > + > > > + pci_set_power_state(pdev, PCI_D0); > > > + pci_restore_state(pdev); > > > + pci_set_master(pdev); > > > + > > > + err = pci_enable_device(pdev); > > > + if (err) > > > + goto err_recovery; > > > + > > > + /* Exit M3, transition to M0 state */ > > > + err = mhi_pm_resume(mhi_cntrl); > > > + if (err) { > > > + dev_err(&pdev->dev, "failed to resume device: %d\n", err); > > > + goto err_recovery; > > > + } > > > + > > > + return 0; > > > + > > > +err_recovery: > > > + /* The device may have loose power or crashed, try recovering it */ > > > > Did you actually hit this scenario? In the case of power loss or crash, can we > > access the MHI register space? > > Yes I hit this scenario on my computer since PCI power is not > maintained, mhi_pm_resume behaves correctly whether the MHI register > space is available or not since it will hit and return an error moving > to M0 state: > mhi mhi0: Did not enter M0 state, MHI state: M3, PM state: M3->M0 > Okay. As long as you are returning the error code to PM core I'm fine. Thanks, Mani > > Regards, > Loic