[AMD Official Use Only - General] > Sent: Friday, January 26, 2024 9:43 AM > To: Alex Deucher <alexdeucher@xxxxxxxxx> > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Sharma, Deepak > <Deepak.Sharma@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: RE: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for > PM abort case > > [AMD Official Use Only - General] > > [AMD Official Use Only - General] > > > From: Alex Deucher <alexdeucher@xxxxxxxxx> > > Sent: Thursday, January 25, 2024 11:24 PM > > To: Liang, Prike <Prike.Liang@xxxxxxx> > > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > > <Alexander.Deucher@xxxxxxx>; Sharma, Deepak > <Deepak.Sharma@xxxxxxx> > > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers > > for PM abort case > > > > On Thu, Jan 25, 2024 at 10:22 AM Alex Deucher <alexdeucher@xxxxxxxxx> > > wrote: > > > > > > On Wed, Jan 24, 2024 at 9:39 PM Liang, Prike <Prike.Liang@xxxxxxx> > > wrote: > > > > > > > > [AMD Official Use Only - General] > > > > > > > > Hi, Alex > > > > > -----Original Message----- > > > > > From: Alex Deucher <alexdeucher@xxxxxxxxx> > > > > > Sent: Wednesday, January 24, 2024 11:59 PM > > > > > To: Liang, Prike <Prike.Liang@xxxxxxx> > > > > > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > > > > > <Alexander.Deucher@xxxxxxx>; Sharma, Deepak > > > > > <Deepak.Sharma@xxxxxxx> > > > > > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC > > > > > registers for PM abort case > > > > > > > > > > On Wed, Jan 24, 2024 at 2:12 AM Prike Liang > > > > > <Prike.Liang@xxxxxxx> > > wrote: > > > > > > > > > > > > In the PM abort cases, the gfx power rail doesn't turn off so > > > > > > some GFXDEC registers/CSB can't reset to default vaule. In > > > > > > order to avoid unexpected problem now need skip to program > > > > > > GFXDEC registers and bypass issue CSB packet for PM abort case. > > > > > > > > > > > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> > > > > > > --- > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + > > > > > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++++ > > > > > > 3 files changed, 6 insertions(+) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > > > > > index c5f3859fd682..26d983eb831b 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > > > > > > @@ -1079,6 +1079,7 @@ struct amdgpu_device { > > > > > > bool in_s3; > > > > > > bool in_s4; > > > > > > bool in_s0ix; > > > > > > + bool pm_complete; > > > > > > > > > > > > enum pp_mp1_state mp1_state; > > > > > > struct amdgpu_doorbell_index doorbell_index; diff > > > > > > --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > index 475bd59c9ac2..a01f9b0c2f30 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > > > @@ -2486,6 +2486,7 @@ static int > > > > > > amdgpu_pmops_suspend_noirq(struct > > > > > device *dev) > > > > > > struct drm_device *drm_dev = dev_get_drvdata(dev); > > > > > > struct amdgpu_device *adev = drm_to_adev(drm_dev); > > > > > > > > > > > > + adev->pm_complete = true; > > > > > > > > > > This needs to be cleared somewhere on resume. > > > > [Liang, Prike] This flag is designed to indicate the amdgpu > > > > device > > suspension process status and will update the patch and clear it at > > the amdgpu suspension beginning point. > > > > > > > > > > > if (amdgpu_acpi_should_gpu_reset(adev)) > > > > > > return amdgpu_asic_reset(adev); > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > index 57808be6e3ec..3bf51f18e13c 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > > > > > @@ -3034,6 +3034,10 @@ static int gfx_v9_0_cp_gfx_start(struct > > > > > > amdgpu_device *adev) > > > > > > > > > > > > gfx_v9_0_cp_gfx_enable(adev, true); > > > > > > > > > > > > + if (adev->in_suspend && !adev->pm_complete) { > > > > > > + DRM_INFO(" will skip the csb ring write\n"); > > > > > > + return 0; > > > > > > + } > > > > > > > > > > We probably want a similar fix for other gfx generations as well. > > > > > > > > > > Alex > > > > > > > > > [Liang, Prike] IIRC, there's no issue on the Mendocino side even > > > > without > > the fix. How about keep the other gfx generations unchanged firstly > > and after sort out the failed case will add the quirk for each specific gfx > respectively? > > > > > > Mendocino only supports S0i3 so we don't touch gfx on suspend/resume. > > > This would only happen on platforms that support S3. > > > > E.g., try an aborted suspend on Raphael or PHX2. > > > > Alex > > > [Liang, Prike] Thanks for the reminder, but the Mendocino also was verified > on the system with S3 enabled from BIOS. I will double confirm if there need > the quirk on the RPL or PHX2. > [Prike] According to @Zhang, Jesse(Jie) and @Huang, Tim further confirm there's no such problem on the RPL and PHX, so we may only need apply this quirk on some specific gfx9 series. > > > > > > Alex > > > > > > > > > > > > > r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + > 3); > > > > > > if (r) { > > > > > > DRM_ERROR("amdgpu: cp failed to lock ring > > > > > > (%d).\n", r); > > > > > > -- > > > > > > 2.34.1 > > > > > >