[AMD Official Use Only - General]
I recall that there was a previous discussion around this and that time we found that the range is already set earlier during DPM enablement.
The suspected root cause was enable/disable of thermal alert within this call to set range again.
Lijo
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of Alex Deucher <alexdeucher@xxxxxxxxx>
Sent: Friday, March 10, 2023 8:51:06 PM
To: Chen, Guchun <Guchun.Chen@xxxxxxx>
Cc: David Airlie <airlied@xxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; Zhenneng Li <lizhenneng@xxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: resove reboot exception for si oland
Sent: Friday, March 10, 2023 8:51:06 PM
To: Chen, Guchun <Guchun.Chen@xxxxxxx>
Cc: David Airlie <airlied@xxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; Zhenneng Li <lizhenneng@xxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: resove reboot exception for si oland
On Fri, Mar 10, 2023 at 3:18 AM Chen, Guchun <Guchun.Chen@xxxxxxx> wrote:
>
>
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> > Zhenneng Li
> > Sent: Friday, March 10, 2023 3:40 PM
> > To: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
> > Cc: David Airlie <airlied@xxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>;
> > linux-kernel@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Zhenneng Li
> > <lizhenneng@xxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Daniel Vetter
> > <daniel@xxxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
> > Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland
> >
> > During reboot test on arm64 platform, it may failure on boot.
> >
> > The error message are as follows:
> > [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> > *ERROR*
> > late_init of IP block <si_dpm> failed -22
> > [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init
> > failed
> > [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
> > ---
> > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > index d6d9e3b1b2c0..dee51c757ac0 100644
> > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
> > if (!adev->pm.dpm_enabled)
> > return 0;
> >
> > - ret = si_set_temperature_range(adev);
> > - if (ret)
> > - return ret;
>
> si_set_temperature_range should be platform agnostic. Can you please elaborate more?
>
Yes. Not setting this means we won't get thermal interrupts. We
shouldn't skip this.
Alex
> Regards,
> Guchun
>
> > #if 0 //TODO ?
> > si_dpm_powergate_uvd(adev, true);
> > #endif
> > --
> > 2.25.1
>
>
>
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> > Zhenneng Li
> > Sent: Friday, March 10, 2023 3:40 PM
> > To: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
> > Cc: David Airlie <airlied@xxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>;
> > linux-kernel@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Zhenneng Li
> > <lizhenneng@xxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Daniel Vetter
> > <daniel@xxxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
> > Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland
> >
> > During reboot test on arm64 platform, it may failure on boot.
> >
> > The error message are as follows:
> > [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> > *ERROR*
> > late_init of IP block <si_dpm> failed -22
> > [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init
> > failed
> > [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
> > ---
> > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > index d6d9e3b1b2c0..dee51c757ac0 100644
> > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
> > if (!adev->pm.dpm_enabled)
> > return 0;
> >
> > - ret = si_set_temperature_range(adev);
> > - if (ret)
> > - return ret;
>
> si_set_temperature_range should be platform agnostic. Can you please elaborate more?
>
Yes. Not setting this means we won't get thermal interrupts. We
shouldn't skip this.
Alex
> Regards,
> Guchun
>
> > #if 0 //TODO ?
> > si_dpm_powergate_uvd(adev, true);
> > #endif
> > --
> > 2.25.1
>