[AMD Official Use Only - AMD Internal Distribution Only] OK ----------------- Best Regards, Thomas -----Original Message----- From: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx> Sent: Friday, June 28, 2024 3:36 PM To: Chai, Thomas <YiPeng.Chai@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; Yang, Stanley <Stanley.Yang@xxxxxxx> Subject: RE: [PATCH] drm/amdgpu: sysfs node disable query error count during gpu reset [AMD Official Use Only - AMD Internal Distribution Only] it is better to apply changes on both ACA and MCA path. Best Regards, Kevin -----Original Message----- From: Chai, Thomas <YiPeng.Chai@xxxxxxx> Sent: Friday, June 28, 2024 3:31 PM To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; Yang, Stanley <Stanley.Yang@xxxxxxx>; Chai, Thomas <YiPeng.Chai@xxxxxxx> Subject: [PATCH] drm/amdgpu: sysfs node disable query error count during gpu reset Sysfs node disable query error count during gpu reset. Signed-off-by: YiPeng Chai <YiPeng.Chai@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index ac7ded01dad0..ab2e11e1639e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -619,6 +619,7 @@ static const struct file_operations amdgpu_ras_debugfs_eeprom_ops = { static ssize_t amdgpu_ras_sysfs_read(struct device *dev, struct device_attribute *attr, char *buf) { + int ret; struct ras_manager *obj = container_of(attr, struct ras_manager, sysfs_attr); struct ras_query_if info = { .head = obj->head, @@ -627,7 +628,10 @@ static ssize_t amdgpu_ras_sysfs_read(struct device *dev, if (!amdgpu_ras_get_error_query_ready(obj->adev)) return sysfs_emit(buf, "Query currently inaccessible\n"); - if (amdgpu_ras_query_error_status(obj->adev, &info)) + ret = amdgpu_ras_query_error_status(obj->adev, &info); + if (ret == -EIO) /* gpu reset is ongoing */ + return sysfs_emit(buf, "Query currently inaccessible\n"); + else if (ret) return -EINVAL; if (amdgpu_ip_version(obj->adev, MP0_HWIP, 0) != IP_VERSION(11, 0, 2) && -- 2.34.1