The series is to enable the feature of GPU RMA(Return Merchandise Authorization) which is trigged when bad pages detected by RAS ECC exceed the threshold value. When the saved bad pages written to eeprom reach the threshold, one ras recovery will be issued immediately and the recovery will fail to tell user that the GPU is BAD and needs to be retired for further check. During bootup, similar BAD GPU check is conducted as well when eeprom get initialized, and it will break boot up for user's awareness. User could set bad_page_threshold=0 when probing amdgpu driver to disable this feature to bring up GPU, and reset eeprom later. Guchun Chen (5): drm/amdgpu: add bad page count threshold in module parameter drm/amdgpu: validate bad page threshold in ras drm/amdgpu: conduct bad gpu check during bootup/reset drm/amdgpu: restore ras flags when user resets eeprom drm/amdgpu: calculate actual size instead of hardcode size drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 70 ++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 19 +++- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 98 ++++++++++++++++++- .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 8 +- 7 files changed, 211 insertions(+), 17 deletions(-) -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx