Currently, sdma edc counters are grouped in gfx edc counter registers array (sec_ded_counter_registers), which results to several issues including: 1). count sdma ras error into gfx ip blocks when querying gfx error counter (i.e. through sysfs gfx_error_count node). 2). kernel crash (access NULL pointer) when querying gfx error counter on vega20. there is only 2 sdma instances while the gfx edc counter register array unifed arcturus and vega20 cases. then driver will force to read sdma2 ~ 7 edc counter registers even the ip base address is not initlaized. 3). unnecessary/wrong grbm switch even reading sdma edc counter. To fix above issue, the series will separate sdma ras query functions from gfx one. check the sdam_edc_counters and report back error count and the error type as well. Hawking Zhang (4): drm/amdgpu: add query_ras_error_count function for sdma v4 drm/amdgpu: support error reporting for sdma ip block drm/amdgpu: add ras_late_init and ras_fini for sdma v4 drm/amdgpu: read sdma edc counter to clear the counters drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 + drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 9 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 11 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 176 ++++++++++++++++++++++- 4 files changed, 191 insertions(+), 12 deletions(-) -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx