On Wed, Sep 02, 2020 at 02:42:05PM -0400, Andrey Grodzovsky wrote: > Wait for HW/PSP initiated ASIC reset to complete before > starting the recovery operations. > > v2: Remove typo > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> > Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index e999f1f..412d07e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4838,14 +4838,32 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev) > { > struct drm_device *dev = pci_get_drvdata(pdev); > struct amdgpu_device *adev = drm_to_adev(dev); > - int r; > + int r, i; > bool vram_lost; > + u32 memsize; > > DRM_INFO("PCI error: slot reset callback!!\n"); > > + /* wait for asic to come out of reset */ I know it's totally OCD, but it is a minor speed bump to read "asic" here and "ASIC" in the commit log above and the new comment below. > + msleep(500); > + > pci_restore_state(pdev); > > - adev->in_pci_err_recovery = true; > + /* confirm ASIC came out of reset */ > + for (i = 0; i < adev->usec_timeout; i++) { > + memsize = amdgpu_asic_get_config_memsize(adev); > + > + if (memsize != 0xffffffff) I guess this is a spot where you actually depend on an MMIO read returning 0xffffffff because adev->in_pci_err_recovery is false at this point, so amdgpu_mm_rreg() will actually *do* the MMIO read instead of returning 0. Right? > + break; > + udelay(1); > + } > + if (memsize == 0xffffffff) { > + r = -ETIME; > + goto out; > + } > + > + /* TODO Call amdgpu_pre_asic_reset instead */ > + adev->in_pci_err_recovery = true; > r = amdgpu_device_ip_suspend(adev); > adev->in_pci_err_recovery = false; > if (r) > -- > 2.7.4 >