[AMD Official Use Only - General] > > On 1/25/2024 8:52 AM, Prike Liang wrote: > > In the pm abort case the gfx power rail not turn off from FCH side and > > this will lead to the gfx reinitialized failed base on the unknown gfx > > HW status, so let's reset the gpu to a known good power state. > > > > From the description, this an APU only problem (or this patch could only > resolve APU abort sequence). However, there is no check for APU in the patch > below. > [Prike] IIRC, there also has a similar problem on the dGPU side when suspend abort and now this patch is only drafted for a hot issue on the RV series. If need we can add a TODO item for drafting a more generic solution. > > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++ > > drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++++++- > > 2 files changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index 56d9dfa61290..4c40ffaaa5c2 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct drm_device > *dev, bool fbcon) > > return r; > > } > > > > + if(amdgpu_asic_need_reset_on_init(adev)) { > > + DRM_INFO("PM abort case and let's reset asic \n"); > > + amdgpu_asic_reset(adev); > > + } > > + > > suspend_noirq is specific for suspend scenarios and not valid for freeze/thaw. > I guess this could trigger reset for successful restore on APUs. > [Prike] If doesn't run into noirq_suspend then still need further check whether the PSP TOS is still alive before gpu reset. > > if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) > > return 0; > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c > > b/drivers/gpu/drm/amd/amdgpu/soc15.c > > index 15033efec2ba..9329a00b6abc 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > > @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct > amdgpu_device *adev) > > if (adev->asic_type == CHIP_RENOIR) > > return true; > > > > + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); > > + > > /* Just return false for soc15 GPUs. Reset does not seem to > > * be necessary. > > */ > > The comment now doesn't make sense. > > Thanks, > Lijo > > > + if (adev->in_suspend && !adev->in_s0ix && > > + !adev->pm_complete && > > + sol_reg) > > + return true; > > + > > if (!amdgpu_passthrough(adev)) > > return false; > > > > @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct > amdgpu_device *adev) > > /* Check sOS sign of life register to confirm sys driver and sOS > > * are already been loaded. > > */ > > - sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); > > if (sol_reg) > > return true; > >