RE: [PATCH Review 1/2] drm/amdgpu: skip query ecc info in gpu recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only]

Reviewed-by: Hawking Zhang <Hawking.Zhang@xxxxxxx>

Regards,
Hawking
-----Original Message-----
From: Stanley.Yang <Stanley.Yang@xxxxxxx> 
Sent: Thursday, December 2, 2021 16:02
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Clements, John <John.Clements@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; Chai, Thomas <YiPeng.Chai@xxxxxxx>
Cc: Yang, Stanley <Stanley.Yang@xxxxxxx>
Subject: [PATCH Review 1/2] drm/amdgpu: skip query ecc info in gpu recovery

this is a workaround due to get ecc info failed during gpu recovery

[  700.236122] amdgpu 0000:09:00.0: amdgpu: Failed to export SMU ecc table!
[  700.236128] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.331171] amdgpu: qcm fence wait loop timeout expired [  704.331194] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption [  704.332445] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.332448] amdgpu 0000:09:00.0: amdgpu: Bailing on TDR for s_job:ffffffffffffffff, as another already in progress [  704.332456] amdgpu: Pasid 0x8000 destroy queue 0 failed, ret -62 [  710.360924] amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000013 SMN_C2PMSG_82:0x00000007 [  710.360964] amdgpu 0000:09:00.0: amdgpu: Failed to disable smu features.
[  710.361002] amdgpu 0000:09:00.0: amdgpu: Fail to disable dpm features!
[  710.361014] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62

Signed-off-by: Stanley.Yang <Stanley.Yang@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 33fa1f619926..641f9b447997 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -895,6 +895,10 @@ static void amdgpu_ras_get_ecc_info(struct amdgpu_device *adev, struct ras_err_d
 	struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
 	int ret = 0;
 
+	/* skip get ecc info during gpu recovery */
+	if (atomic_read(&ras->in_recovery) == 1)
+		return;
+
 	/*
 	 * choosing right query method according to
 	 * whether smu support query error information
--
2.17.1




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux