[AMD Official Use Only - General] Hi, Alex > -----Original Message----- > From: Alex Deucher <alexdeucher@xxxxxxxxx> > Sent: Thursday, January 25, 2024 12:07 AM > To: Liang, Prike <Prike.Liang@xxxxxxx> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; Sharma, Deepak > <Deepak.Sharma@xxxxxxx> > Subject: Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case > > On Wed, Jan 24, 2024 at 2:17 AM Prike Liang <Prike.Liang@xxxxxxx> wrote: > > > > In the pm abort case the gfx power rail not turn off from FCH side and > > this will lead to the gfx reinitialized failed base on the unknown gfx > > HW status, so let's reset the gpu to a known good power state. > > > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +++++ > > drivers/gpu/drm/amd/amdgpu/soc15.c | 7 ++++++- > > 2 files changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > index 9153f69bad7f..14b66c49a536 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > @@ -2935,6 +2935,11 @@ static int psp_resume(void *handle) > > > > dev_info(adev->dev, "PSP is resuming...\n"); > > > > + if(amdgpu_asic_need_reset_on_init(adev)) { > > + DRM_INFO("PM abort case and let's reset asic \n"); > > + amdgpu_asic_reset(adev); > > + } > > + > > Seems like it would be better to put this in the resume callback. [Liang, Prike] Yes, it makes sense to put the device level function call in device resume callback. > > > if (psp->mem_train_ctx.enable_mem_training) { > > ret = psp_mem_training(psp, PSP_MEM_TRAIN_RESUME); > > if (ret) { > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c > > b/drivers/gpu/drm/amd/amdgpu/soc15.c > > index 15033efec2ba..6ec4f6958c4c 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > > @@ -804,9 +804,15 @@ static bool soc15_need_reset_on_init(struct > amdgpu_device *adev) > > if (adev->asic_type == CHIP_RENOIR) > > return true; > > > > + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); > > Is this register consistent for all soc15 chips? > > Alex > Yes, confirmed from the PSP firmware guys this register can be used for indicating the TOS/SOS live status on the SOC15 series. > > + > > /* Just return false for soc15 GPUs. Reset does not seem to > > * be necessary. > > */ > > + if (adev->in_suspend && !added->microplate && > > + sol_reg) > > + return true; > > + > > if (!amdgpu_passthrough(adev)) > > return false; > > > > @@ -816,7 +822,6 @@ static bool soc15_need_reset_on_init(struct > amdgpu_device *adev) > > /* Check sOS sign of life register to confirm sys driver and sOS > > * are already been loaded. > > */ > > - sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); > > if (sol_reg) > > return true; > >> -- > > 2.34.1 > >