The current hw_init code for si_dpm ignores the return value of the function attempting to initialize the thermal controller, which in turn sets the dpm_enabled status wrongly to true in hw_init, which should be actually false. This patch: - Adds the return value check for thermal controller initialization, and passes the return value to dpm_enable(). - Adds a DRM_ERROR to indicate this failure. Cc: Alex Deucher <Alexander.Deucher@xxxxxxx> Cc: Maruthi Bayyavarapu <maruthi.bayyavarapu@xxxxxxx> Cc: Sonny Jing <Sonny.Jiang@xxxxxxx> PS: This issue was observed on OLAND while running the reboot stress test. Signed-off-by: Shashank Sharma <shashank.sharma@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/si_dpm.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/si_dpm.c b/drivers/gpu/drm/amd/amdgpu/si_dpm.c index c00ba4b23c9a..923a1da554b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/si_dpm.c +++ b/drivers/gpu/drm/amd/amdgpu/si_dpm.c @@ -6868,7 +6868,11 @@ static int si_dpm_enable(struct amdgpu_device *adev) si_start_dpm(adev); si_enable_auto_throttle_source(adev, AMDGPU_DPM_AUTO_THROTTLE_SRC_THERMAL, true); - si_thermal_start_thermal_controller(adev); + ret = si_thermal_start_thermal_controller(adev); + if (ret) { + DRM_ERROR("si_thermal_start_thermal_controller failed\n"); + return ret; + } return 0; } -- 2.25.1 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx