Re: [PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 17.10.23 um 16:36 schrieb Stanley.Yang:
This is workaround, kiq ring test failed in suspend stage when do ras
recovery for gfx v9_4_3.

Any idea why that failed? Problems like this usually point to an incorrect init or in this case re-init procedure and are actually what the ring test should uncover.

Christian.


Change-Id: I8de9900aa76706f59bc029d4e9e8438c6e1db8e0
Signed-off-by: Stanley.Yang <Stanley.Yang@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 21 +++++++++++++++++++++
  1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 9a158018ae16..902e60203809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -29,6 +29,7 @@
  #include "amdgpu_rlc.h"
  #include "amdgpu_ras.h"
  #include "amdgpu_xcp.h"
+#include "amdgpu_xgmi.h"
/* delay 0.1 second to enable gfx off feature */
  #define GFX_OFF_DELAY_ENABLE         msecs_to_jiffies(100)
@@ -501,6 +502,9 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int xcc_id)
  {
  	struct amdgpu_kiq *kiq = &adev->gfx.kiq[xcc_id];
  	struct amdgpu_ring *kiq_ring = &kiq->ring;
+	struct amdgpu_hive_info *hive;
+	struct amdgpu_ras *ras;
+	int hive_ras_recovery;
  	int i, r = 0;
  	int j;
@@ -521,6 +525,23 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int xcc_id)
  					   RESET_QUEUES, 0, 0);
  	}
+ /**
+	 * This is workaround: only skip kiq_ring test
+	 * during ras recovery in suspend stage for gfx v9_4_3
+	 */
+	hive = amdgpu_get_xgmi_hive(adev);
+	if (hive) {
+		hive_ras_recovery = atomic_read(&hive->ras_recovery);
+		amdgpu_put_xgmi_hive(hive);
+	}
+
+	ras = amdgpu_ras_get_context(adev);
+	if ((amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) &&
+		ras && (atomic_read(&ras->in_recovery) || hive_ras_recovery)) {
+		spin_unlock(&kiq->ring_lock);
+		return 0;
+	}
+
  	if (kiq_ring->sched.ready && !adev->job_hang)
  		r = amdgpu_ring_test_helper(kiq_ring);
  	spin_unlock(&kiq->ring_lock);




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux