Re: [PATCH 2/2] drm/amdgpu/gfx9: put queue resets behind a debug option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024-08-20 16:25, Alex Deucher wrote:
Pending extended validation.

Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 4 ++++
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c             | 4 ++++
  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c           | 6 ++++++
  3 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index c63528a4e8941..1254a43ec96b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1151,6 +1151,10 @@ uint64_t kgd_gfx_v9_hqd_get_pq_addr(struct amdgpu_device *adev,
  	uint32_t low, high;
  	uint64_t queue_addr = 0;
+ if (!adev->debug_exp_resets &&
+	    !adev->gfx.num_gfx_rings)
+		return 0;
+

Did you put this in the HW-specific code path intentionally? If you want this check to apply to all ASICs, you should put it into detect_queue_hang in kfd_device_queue_manager.c. But maybe the extended validation is HW-specific.

Either way, the patch is

Acked-by: Felix Kuehling <felix.kuehling@xxxxxxx>


  	kgd_gfx_v9_acquire_queue(adev, pipe_id, queue_id, inst);
  	amdgpu_gfx_rlc_enter_safe_mode(adev, inst);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 21089aadbb7b4..8cf5d7925b51c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -7233,6 +7233,10 @@ static int gfx_v9_0_reset_kcq(struct amdgpu_ring *ring,
  	unsigned long flags;
  	int i, r;
+ if (!adev->debug_exp_resets &&
+	    !adev->gfx.num_gfx_rings)
+		return -EINVAL;
+
  	if (amdgpu_sriov_vf(adev))
  		return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 2067f26d3a9d8..f8649546b9c4c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -3052,6 +3052,9 @@ static void gfx_v9_4_3_ring_soft_recovery(struct amdgpu_ring *ring,
  	struct amdgpu_device *adev = ring->adev;
  	uint32_t value = 0;
+ if (!adev->debug_exp_resets)
+		return;
+
  	value = REG_SET_FIELD(value, SQ_CMD, CMD, 0x03);
  	value = REG_SET_FIELD(value, SQ_CMD, MODE, 0x01);
  	value = REG_SET_FIELD(value, SQ_CMD, CHECK_VMID, 1);
@@ -3475,6 +3478,9 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
  	unsigned long flags;
  	int r, i;
+ if (!adev->debug_exp_resets)
+		return -EINVAL;
+
  	if (amdgpu_sriov_vf(adev))
  		return -EINVAL;



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux