On 1/26/2024 2:30 PM, Liang, Prike wrote: > [AMD Official Use Only - General] > >> >> On 1/25/2024 8:52 AM, Prike Liang wrote: >>> In the pm abort case the gfx power rail not turn off from FCH side and >>> this will lead to the gfx reinitialized failed base on the unknown gfx >>> HW status, so let's reset the gpu to a known good power state. >>> >> >> From the description, this an APU only problem (or this patch could only >> resolve APU abort sequence). However, there is no check for APU in the patch >> below. >> > [Prike] IIRC, there also has a similar problem on the dGPU side when suspend abort and > now this patch is only drafted for a hot issue on the RV series. If need we can add a TODO > item for drafting a more generic solution. > If this addresses a specific issue, then better to check the specific IP revision before presenting this as a generic one. Presently the patch logic considers this as a generic for all soc15 asics. >> >>> Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++ >>> drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++++++- >>> 2 files changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 56d9dfa61290..4c40ffaaa5c2 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct drm_device >> *dev, bool fbcon) >>> return r; >>> } >>> >>> + if(amdgpu_asic_need_reset_on_init(adev)) { >>> + DRM_INFO("PM abort case and let's reset asic \n"); >>> + amdgpu_asic_reset(adev); >>> + } >>> + >> >> suspend_noirq is specific for suspend scenarios and not valid for freeze/thaw. >> I guess this could trigger reset for successful restore on APUs. >> > [Prike] If doesn't run into noirq_suspend then still need further check whether the PSP TOS is still alive before gpu reset. > AFAIU, for a successful resume from hibernate on APUs, TOS will still be running. The patch will trigger a reset in such cases also. Thanks, Lijo >>> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) >>> return 0; >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c >>> b/drivers/gpu/drm/amd/amdgpu/soc15.c >>> index 15033efec2ba..9329a00b6abc 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c >>> @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct >> amdgpu_device *adev) >>> if (adev->asic_type == CHIP_RENOIR) >>> return true; >>> >>> + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); >>> + >>> /* Just return false for soc15 GPUs. Reset does not seem to >>> * be necessary. >>> */ >> >> The comment now doesn't make sense. >> >> Thanks, >> Lijo >> >>> + if (adev->in_suspend && !adev->in_s0ix && >>> + !adev->pm_complete && >>> + sol_reg) >>> + return true; >>> + >>> if (!amdgpu_passthrough(adev)) >>> return false; >>> >>> @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct >> amdgpu_device *adev) >>> /* Check sOS sign of life register to confirm sys driver and sOS >>> * are already been loaded. >>> */ >>> - sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81); >>> if (sol_reg) >>> return true; >>>