Re: [PATCH v4 3/8] drm/amdgpu: Fix SMU error failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/2/20 6:05 PM, Bjorn Helgaas wrote:
On Wed, Sep 02, 2020 at 02:42:05PM -0400, Andrey Grodzovsky wrote:
Wait for HW/PSP initiated ASIC reset to complete before
starting the recovery operations.

v2: Remove typo

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 ++++++++++++++++++++--
  1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e999f1f..412d07e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4838,14 +4838,32 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev)
  {
  	struct drm_device *dev = pci_get_drvdata(pdev);
  	struct amdgpu_device *adev = drm_to_adev(dev);
-	int r;
+	int r, i;
  	bool vram_lost;
+	u32 memsize;
DRM_INFO("PCI error: slot reset callback!!\n"); + /* wait for asic to come out of reset */
I know it's totally OCD, but it is a minor speed bump to read "asic"
here and "ASIC" in the commit log above and the new comment below.

+	msleep(500);
+
  	pci_restore_state(pdev);
- adev->in_pci_err_recovery = true;
+	/* confirm  ASIC came out of reset */
+	for (i = 0; i < adev->usec_timeout; i++) {
+		memsize = amdgpu_asic_get_config_memsize(adev);
+
+		if (memsize != 0xffffffff)
I guess this is a spot where you actually depend on an MMIO read
returning 0xffffffff because adev->in_pci_err_recovery is false at
this point, so amdgpu_mm_rreg() will actually *do* the MMIO read
instead of returning 0.  Right?


Yes, i picked this form another place where they use it to confirm PCI confspace and
the BARs specifically are restored and active and so you can read back sane MMIO regs values.

Andrey



+			break;
+		udelay(1);
+	}
+	if (memsize == 0xffffffff) {
+		r = -ETIME;
+		goto out;
+	}
+
+	/* TODO Call amdgpu_pre_asic_reset instead */
+	adev->in_pci_err_recovery = true;	
  	r = amdgpu_device_ip_suspend(adev);
  	adev->in_pci_err_recovery = false;
  	if (r)
--
2.7.4




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux