[AMD Public Use] Hello GuChun/Hawking, Thank you for your feedback, I have updated the patch with the following amendments:
GuChun, For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error counter register reading, the local SW error counters can only be incremented and not cleared throughout the iteration over the UMC error counter registers. Thank you, John Clements From: Chen, Guchun <Guchun.Chen@xxxxxxx> [AMD Public Use] +#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * (ch_inst) + UMC_6_INST_DIST*(umc_inst)) Coding style problem, miss blank space around last “*”. + for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++) + { Another coding style problem. “{” should follow closely at the same line, not starting at one new line. Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error counter for all UMC channels. But we always use the same variable to do the query. So the value will be overwritten by new one? Then we will miss former error
counters if there are. Correct? Regards, Guchun From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx>
On Behalf Of Zhang, Hawking [AMD Official Use Only - Internal Distribution Only] UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset actually do the same thing? I guess you just want to keep either of them, right? Regards, From: Clements, John <John.Clements@xxxxxxx>
[AMD Official Use Only - Internal Distribution Only] Added patch to resolve following issue where error counter detection was not iterating over all UMC instances/channels. Removed support for accessing UMC error counters via MMIO. Thank you, John Clements |
Attachment:
0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
Description: 0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx