On Fri, May 21, 2021 at 4:14 PM Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> wrote: > > Problem: > When device goes into sleep state due to prolonged > innactivity (e.g. BACO sleep) and then hot unplugged, > PCI core will try to wake up the device as part of > unplug process. Since the device is gone all HW > programming during rpm resume fails leading > to a bad SW state later during pci remove handling. > > Fix: > Use a flag we use for PCIe error recovery to avoid > accessing registres. This allows to succefully complete > rpm resume sequence and finish pci remove. Might make sense to create a preliminary patch to change the name of this flag to something like no_hw_access since it's not specific to pci error handling. Alex > > P.S Must use pci_device_is_present and not drm_dev_enter/exit > here since rpm resume happens before PCI remove and so the > unplug flag is not set yet. > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1081 > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index d8db5929cdd9..ab95ebf56636 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -1555,6 +1555,11 @@ static int amdgpu_pmops_runtime_resume(struct device *dev) > if (!adev->runpm) > return -EINVAL; > > + /* Avoids registers access if device is physically gone */ > + if (!pci_device_is_present(adev->pdev)) > + adev->in_pci_err_recovery = true; > + > + > if (amdgpu_device_supports_px(drm_dev)) { > drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; > > -- > 2.25.1 >