RE: [PATCH] drm/amdgpu: add mb for si

"Quan, Evan" <Evan.Quan@xxxxxxx> · Fri, 25 Nov 2022 02:06:34 +0000

[AMD Official Use Only - General]

Did you see that? It's a patch which I created by git-format-patch.
Anyway I will paste the changes below. I was suspecting maybe we need some waits for smu running.

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index 49c398ec0aaf..9f308a021b2d 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -6814,6 +6814,7 @@ static int si_dpm_enable(struct amdgpu_device *adev)
        struct si_power_info *si_pi = si_get_pi(adev);
        struct amdgpu_ps *boot_ps = adev->pm.dpm.boot_ps;
        int ret;
+       int i;

        if (amdgpu_si_is_smc_running(adev))
                return -EINVAL;
@@ -6909,6 +6910,17 @@ static int si_dpm_enable(struct amdgpu_device *adev)
        si_program_response_times(adev);
        si_program_ds_registers(adev);
        si_dpm_start_smc(adev);
+       /* Waiting for smc alive */
+       for (i = 0; i < adev->usec_timeout; i++) {
+               if (amdgpu_si_is_smc_running(adev))
+                       break;
+               udelay(1);
+       }
+       if (i >= adev->usec_timeout) {
+               DRM_ERROR("Timedout on waiting for smu running\n");
+               return -EINVAL;
+       }
+
        ret = si_notify_smc_display_change(adev, false);
        if (ret) {
                DRM_ERROR("si_notify_smc_display_change failed\n");


BR
Evan
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
> Sent: Thursday, November 24, 2022 6:06 PM
> To: Quan, Evan <Evan.Quan@xxxxxxx>; 李真能 <lizhenneng@xxxxxxxxxx>;
> Michel Dänzer <michel.daenzer@xxxxxxxxxxx>; Koenig, Christian
> <Christian.Koenig@xxxxxxx>; Deucher, Alexander
> <Alexander.Deucher@xxxxxxx>
> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; Pan, Xinhui <Xinhui.Pan@xxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH] drm/amdgpu: add mb for si
> 
> That's not a patch but some binary file?
> 
> Christian.
> 
> Am 24.11.22 um 11:04 schrieb Quan, Evan:
> > [AMD Official Use Only - General]
> >
> > Could the attached patch help?
> >
> > Evan
> >> -----Original Message-----
> >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf
> Of ???
> >> Sent: Friday, November 18, 2022 5:25 PM
> >> To: Michel Dänzer <michel.daenzer@xxxxxxxxxxx>; Koenig, Christian
> >> <Christian.Koenig@xxxxxxx>; Deucher, Alexander
> >> <Alexander.Deucher@xxxxxxx>
> >> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Pan, Xinhui <Xinhui.Pan@xxxxxxx>;
> >> linux-kernel@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH] drm/amdgpu: add mb for si
> >>
> >>
> >> 在 2022/11/18 17:18, Michel Dänzer 写道:
> >>> On 11/18/22 09:01, Christian König wrote:
> >>>> Am 18.11.22 um 08:48 schrieb Zhenneng Li:
> >>>>> During reboot test on arm64 platform, it may failure on boot, so
> >>>>> add this mb in smc.
> >>>>>
> >>>>> The error message are as follows:
> >>>>> [    6.996395][ 7] [  T295] [drm:amdgpu_device_ip_late_init
> >>>>> [amdgpu]] *ERROR*
> >>>>>                   late_init of IP block <si_dpm> failed -22 [
> >>>>> 7.006919][ 7] [  T295] amdgpu 0000:04:00.0:
> >>>>> amdgpu_device_ip_late_init failed [    7.014224][ 7] [  T295]
> >>>>> amdgpu
> >>>>> 0000:04:00.0: Fatal error during GPU init
> >>>> Memory barries are not supposed to be sprinkled around like this,
> >>>> you
> >> need to give a detailed explanation why this is necessary.
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> Signed-off-by: Zhenneng Li <lizhenneng@xxxxxxxxxx>
> >>>>> ---
> >>>>>     drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++
> >>>>>     1 file changed, 2 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
> >>>>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
> >>>>> index 8f994ffa9cd1..c7656f22278d 100644
> >>>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
> >>>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
> >>>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct
> >>>>> amdgpu_device *adev)
> >>>>>         u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL);
> >>>>>         u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0);
> >>>>>     +    mb();
> >>>>> +
> >>>>>         if (!(rst & RST_REG) && !(clk & CK_DISABLE))
> >>>>>             return true;
> >>> In particular, it makes no sense in this specific place, since it
> >>> cannot directly
> >> affect the values of rst & clk.
> >>
> >> I thinks so too.
> >>
> >> But when I do reboot test using nine desktop machines,  there maybe
> >> report this error on one or two machines after Hundreds of times or
> >> Thousands of times reboot test, at the beginning, I use msleep()
> >> instead of mb(), these two methods are all works, but I don't know what
> is the root case.
> >>
> >> I use this method on other verdor's oland card, this error message
> >> are reported again.
> >>
> >> What could be the root reason?
> >>
> >> test environmen:
> >>
> >> graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87
> >>
> >> driver: amdgpu
> >>
> >> os: ubuntu 2004
> >>
> >> platform: arm64
> >>
> >> kernel: 5.4.18
> >>
<<attachment: winmail.dat>>