Currently dmesg doesn't warn when the number of bad pages approaches the threshold for page retirement. WARN when the number of bad pages is at 90% or greater for easier checks and planning, instead of waiting until the GPU is full of bad pages Cc: Luben Tuikov <luben.tuikov@xxxxxxx> Cc: Mukul Joshi <Mukul.Joshi@xxxxxxx> Signed-off-by: Kent Russell <kent.russell@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index 98732518543e..8270aad23a06 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c @@ -1077,6 +1077,16 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control, if (res) DRM_ERROR("RAS table incorrect checksum or error:%d\n", res); + + /* threshold = -1 is automatic, threshold = 0 means that page + * retirement is disabled. + */ + if (amdgpu_bad_page_threshold > 0 && + control->ras_num_recs >= 0 && + control->ras_num_recs >= (amdgpu_bad_page_threshold * 9 / 10)) + DRM_WARN("RAS records:%u approaching threshold:%d", + control->ras_num_recs, + amdgpu_bad_page_threshold); } else if (hdr->header == RAS_TABLE_HDR_BAD && amdgpu_bad_page_threshold != 0) { res = __verify_ras_table_checksum(control); -- 2.25.1