[AMD Official Use Only] Reviewed-by: Hawking Zhang <Hawking.Zhang@xxxxxxx> Regards, Hawking -----Original Message----- From: Stanley.Yang <Stanley.Yang@xxxxxxx> Sent: Tuesday, December 7, 2021 14:40 To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Clements, John <John.Clements@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; Chai, Thomas <YiPeng.Chai@xxxxxxx> Cc: Yang, Stanley <Stanley.Yang@xxxxxxx> Subject: [PATCH Review 1/1] drm/amdgpu: skip umc ras error count harvest remove in recovery stat check, skip umc ras err cnt harvest in amdgpu_ras_log_on_err_counter Signed-off-by: Stanley.Yang <Stanley.Yang@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1043d41b6807..a95d200adff9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -897,11 +897,6 @@ static void amdgpu_ras_get_ecc_info(struct amdgpu_device *adev, struct ras_err_d struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); int ret = 0; - /* skip get ecc info during gpu recovery */ - if (atomic_read(&ras->in_recovery) == 1 && - adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2)) - return; - /* * choosing right query method according to * whether smu support query error information @@ -1752,6 +1747,16 @@ static void amdgpu_ras_log_on_err_counter(struct amdgpu_device *adev) if (info.head.block == AMDGPU_RAS_BLOCK__PCIE_BIF) continue; + /* + * this is a workaround for aldebaran, skip send msg to + * smu to get ecc_info table due to smu handle get ecc + * info table failed temporarily. + * should be removed until smu fix handle ecc_info table. + */ + if ((info.head.block == AMDGPU_RAS_BLOCK__UMC) && + (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 2))) + continue; + amdgpu_ras_query_error_status(adev, &info); } } -- 2.17.1