RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for PM abort case

"Liang, Prike" <Prike.Liang@xxxxxxx> · Mon, 29 Jan 2024 08:35:01 +0000



[AMD Official Use Only - General]

> Sent: Friday, January 26, 2024 9:43 AM
> To: Alex Deucher <alexdeucher@xxxxxxxxx>
> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Sharma, Deepak
> <Deepak.Sharma@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for
> PM abort case
>
> [AMD Official Use Only - General]
>
> [AMD Official Use Only - General]
>
> > From: Alex Deucher <alexdeucher@xxxxxxxxx>
> > Sent: Thursday, January 25, 2024 11:24 PM
> > To: Liang, Prike <Prike.Liang@xxxxxxx>
> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander
> > <Alexander.Deucher@xxxxxxx>; Sharma, Deepak
> <Deepak.Sharma@xxxxxxx>
> > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers
> > for PM abort case
> >
> > On Thu, Jan 25, 2024 at 10:22 AM Alex Deucher <alexdeucher@xxxxxxxxx>
> > wrote:
> > >
> > > On Wed, Jan 24, 2024 at 9:39 PM Liang, Prike <Prike.Liang@xxxxxxx>
> > wrote:
> > > >
> > > > [AMD Official Use Only - General]
> > > >
> > > > Hi, Alex
> > > > > -----Original Message-----
> > > > > From: Alex Deucher <alexdeucher@xxxxxxxxx>
> > > > > Sent: Wednesday, January 24, 2024 11:59 PM
> > > > > To: Liang, Prike <Prike.Liang@xxxxxxx>
> > > > > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander
> > > > > <Alexander.Deucher@xxxxxxx>; Sharma, Deepak
> > > > > <Deepak.Sharma@xxxxxxx>
> > > > > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC
> > > > > registers for PM abort case
> > > > >
> > > > > On Wed, Jan 24, 2024 at 2:12 AM Prike Liang
> > > > > <Prike.Liang@xxxxxxx>
> > wrote:
> > > > > >
> > > > > > In the PM abort cases, the gfx power rail doesn't turn off so
> > > > > > some GFXDEC registers/CSB can't reset to default vaule. In
> > > > > > order to avoid unexpected problem now need skip to program
> > > > > > GFXDEC registers and bypass issue CSB packet for PM abort case.
> > > > > >
> > > > > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx>
> > > > > > ---
> > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 1 +
> > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> > > > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 ++++
> > > > > >  3 files changed, 6 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > index c5f3859fd682..26d983eb831b 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > > @@ -1079,6 +1079,7 @@ struct amdgpu_device {
> > > > > >         bool                            in_s3;
> > > > > >         bool                            in_s4;
> > > > > >         bool                            in_s0ix;
> > > > > > +       bool                            pm_complete;
> > > > > >
> > > > > >         enum pp_mp1_state               mp1_state;
> > > > > >         struct amdgpu_doorbell_index doorbell_index; diff
> > > > > > --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > index 475bd59c9ac2..a01f9b0c2f30 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > @@ -2486,6 +2486,7 @@ static int
> > > > > > amdgpu_pmops_suspend_noirq(struct
> > > > > device *dev)
> > > > > >         struct drm_device *drm_dev = dev_get_drvdata(dev);
> > > > > >         struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > > > > >
> > > > > > +       adev->pm_complete = true;
> > > > >
> > > > > This needs to be cleared somewhere on resume.
> > > > [Liang, Prike]  This flag is designed to indicate the amdgpu
> > > > device
> > suspension process status and will update the patch and clear it at
> > the amdgpu suspension beginning point.
> > > > >
> > > > > >         if (amdgpu_acpi_should_gpu_reset(adev))
> > > > > >                 return amdgpu_asic_reset(adev);
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > > index 57808be6e3ec..3bf51f18e13c 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > > @@ -3034,6 +3034,10 @@ static int gfx_v9_0_cp_gfx_start(struct
> > > > > > amdgpu_device *adev)
> > > > > >
> > > > > >         gfx_v9_0_cp_gfx_enable(adev, true);
> > > > > >
> > > > > > +       if (adev->in_suspend && !adev->pm_complete) {
> > > > > > +               DRM_INFO(" will skip the csb ring write\n");
> > > > > > +               return 0;
> > > > > > +       }
> > > > >
> > > > > We probably want a similar fix for other gfx generations as well.
> > > > >
> > > > > Alex
> > > > >
> > > > [Liang, Prike] IIRC, there's no issue on the Mendocino side even
> > > > without
> > the fix. How about keep the other gfx generations unchanged firstly
> > and after sort out the failed case will add the quirk for each specific gfx
> respectively?
> > >
> > > Mendocino only supports S0i3 so we don't touch gfx on suspend/resume.
> > > This would only happen on platforms that support S3.
> >
> > E.g., try an aborted suspend on Raphael or PHX2.
> >
> > Alex
> >
> [Liang, Prike]  Thanks for the reminder, but the Mendocino also was verified
> on the system with S3 enabled from BIOS. I will double confirm if there need
> the quirk on the RPL or PHX2.
>
[Prike]  According to @Zhang, Jesse(Jie) and @Huang, Tim further confirm there's no such problem on the RPL and PHX, so we may only need apply this quirk on some specific gfx9 series.
> > >
> > > Alex
> > >
> > > >
> > > > > >         r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 +
> 3);
> > > > > >         if (r) {
> > > > > >                 DRM_ERROR("amdgpu: cp failed to lock ring
> > > > > > (%d).\n", r);
> > > > > > --
> > > > > > 2.34.1
> > > > > >