[AMD Official Use Only - General] I'm OK with the drop of si_set_temperature_range() in late_init. Meanwhile, it's still not clear to me how this could lead reboot exception. Can you dig this a little bit further? For example, can you check whether the operation(si_thermal_start_thermal_controller()) actually already failed in hw_init(si_dpm_enable more specifically)? @@ -6918,7 +6918,11 @@ static int si_dpm_enable(struct amdgpu_device *adev) si_start_dpm(adev); si_enable_auto_throttle_source(adev, SI_DPM_AUTO_THROTTLE_SRC_THERMAL, true); - si_thermal_start_thermal_controller(adev); + ret = si_thermal_start_thermal_controller(adev); + if (ret) { + DRM_ERROR("si_thermal_start_thermal_controller failed\n"); + return ret; + } ni_update_current_ps(adev, boot_ps); BR Evan > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > Zhenneng Li > Sent: Monday, March 13, 2023 10:57 AM > To: Chen, Guchun <Guchun.Chen@xxxxxxx> > Cc: David Airlie <airlied@xxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; > Zhenneng Li <lizhenneng@xxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; > Daniel Vetter <daniel@xxxxxxxx>; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; Koenig, Christian > <Christian.Koenig@xxxxxxx> > Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland > > During reboot test on arm64 platform, it may failure > on boot. > > The error message are as follows: > [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] > *ERROR* > late_init of IP block <si_dpm> failed -22 > [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init > failed > [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init > --- > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 ------------ > 1 file changed, 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > index d6d9e3b1b2c0..ca9bce895dbe 100644 > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct > amdgpu_device *adev, > > static int si_dpm_late_init(void *handle) > { > - int ret; > - struct amdgpu_device *adev = (struct amdgpu_device *)handle; > - > - if (!adev->pm.dpm_enabled) > - return 0; > - > - ret = si_set_temperature_range(adev); > - if (ret) > - return ret; > -#if 0 //TODO ? > - si_dpm_powergate_uvd(adev, true); > -#endif > return 0; > } > > -- > 2.25.1