On 10/24/2024 12:23 PM, Liang, Prike wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> >> Sent: Thursday, October 24, 2024 11:39 AM >> To: Liang, Prike <Prike.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> >> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete >> >> >> >> On 10/24/2024 8:24 AM, Liang, Prike wrote: >>> [Public] >>> >>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> >>>> Sent: Wednesday, October 23, 2024 6:55 PM >>>> To: Liang, Prike <Prike.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> >>>> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete >>>> >>>> >>>> >>>> On 10/14/2024 1:19 PM, Prike Liang wrote: >>>>> To check the status of S3 suspend completion, use the PM core >>>>> pm_suspend_global_flags bit(1) to detect S3 abort events. Therefore, >>>>> clean up the AMDGPU driver's private flag suspend_complete. >>>>> >>>>> Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 -- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- >>>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++-- >>>>> drivers/gpu/drm/amd/amdgpu/soc15.c | 7 ++----- >>>>> drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +- >>>>> 5 files changed, 5 insertions(+), 12 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> index 48c9b9b06905..9b35763ae0a7 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> @@ -1111,8 +1111,6 @@ struct amdgpu_device { >>>>> bool in_s3; >>>>> bool in_s4; >>>>> bool in_s0ix; >>>>> - /* indicate amdgpu suspension status */ >>>>> - bool suspend_complete; >>>>> >>>>> enum pp_mp1_state mp1_state; >>>>> struct amdgpu_doorbell_index doorbell_index; diff --git >>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> index 680e44fdee6e..78972151b970 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> @@ -2501,7 +2501,6 @@ static int amdgpu_pmops_suspend(struct device >> *dev) >>>>> struct drm_device *drm_dev = dev_get_drvdata(dev); >>>>> struct amdgpu_device *adev = drm_to_adev(drm_dev); >>>>> >>>>> - adev->suspend_complete = false; >>>>> if (amdgpu_acpi_is_s0ix_active(adev)) >>>>> adev->in_s0ix = true; >>>>> else if (amdgpu_acpi_is_s3_active(adev)) @@ -2516,7 +2515,6 @@ >>>>> static int amdgpu_pmops_suspend_noirq(struct device *dev) >>>>> struct drm_device *drm_dev = dev_get_drvdata(dev); >>>>> struct amdgpu_device *adev = drm_to_adev(drm_dev); >>>>> >>>>> - adev->suspend_complete = true; >>>>> if (amdgpu_acpi_should_gpu_reset(adev)) >>>>> return amdgpu_asic_reset(adev); >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>>> index be320d753507..ba8e66744376 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>>> @@ -3276,8 +3276,8 @@ static int gfx_v9_0_cp_gfx_start(struct >>>>> amdgpu_device >>>> *adev) >>>>> * confirmed that the APU gfx10/gfx11 needn't such update. >>>>> */ >>>>> if (adev->flags & AMD_IS_APU && >>>>> - adev->in_s3 && !adev->suspend_complete) { >>>>> - DRM_INFO(" Will skip the CSB packet resubmit\n"); >>>>> + adev->in_s3 && !pm_resume_via_firmware()) { >>>>> + DRM_INFO("Will skip the CSB packet resubmit\n"); >>>>> return 0; >>>>> } >>>>> r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + >>>>> 3); diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c >>>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c >>>>> index 12ff6cf568dc..d9d11131a744 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c >>>>> @@ -584,13 +584,10 @@ static bool soc15_need_reset_on_resume(struct >>>> amdgpu_device *adev) >>>>> * performing pm core test. >>>>> */ >>>>> if (adev->flags & AMD_IS_APU && adev->in_s3 && >>>>> - !pm_resume_via_firmware()) { >>>>> - adev->suspend_complete = false; >>>>> + !pm_resume_via_firmware()) >>>>> return true; >>>>> - } else { >>>>> - adev->suspend_complete = true; >>>>> + else >>>>> return false; >>>>> - } >>>>> } >>>>> >>>>> static int soc15_asic_reset(struct amdgpu_device *adev) diff --git >>>>> a/drivers/gpu/drm/amd/amdgpu/soc21.c >>>>> b/drivers/gpu/drm/amd/amdgpu/soc21.c >>>>> index c4b950e75133..7a47a21ef00f 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c >>>>> @@ -904,7 +904,7 @@ static bool soc21_need_reset_on_resume(struct >>>> amdgpu_device *adev) >>>>> * 2) S3 suspend got aborted and TOS is active. >>>>> */ >>>>> if (!(adev->flags & AMD_IS_APU) && adev->in_s3 && >>>>> - !adev->suspend_complete) { >>>>> + !pm_resume_via_firmware()) { >>>> >>>> Looks like this will cover only ACPI based systems. Not sure if that >>>> assumption is valid for dGPU cases. >>>> >>>> Thanks, >>>> Lijo >>> >>> Yes, the pm_set_resume_via_firmware() function is only called during the >> ACPI_STATE_S3 suspend process. However, ACPI-enabled systems are popular in >> the desktop world. If there are concerns about ACPI configuration, one option could >> be to check if the dGPU needs a reset by directly checking the SOL register. As far >> as I can see, when the dGPU completes its suspend process, the SOL value will >> remain zero until the dGPU is resumed. Conversely, in the case of a suspend abort, >> the SOL value will be non-zero. >>> >> >> in_s3 is set for dGPU in case of s0ix as well. Probably, that's the only case where >> need the flag to avoid unnecessary reset. Otherwise SOL check could be sufficient. >> >> Thanks, >> Lijo >> > Do you mean we need to include S0ix to reset the dGPU during an S0ix suspend abort? However, the in_s0ix state of the dGPU should always be false, and there is no specific suspension handler for the dGPU in S0ix. As a PCIe endpoint, the dGPU should be powered off during system-wide(S0ix and Sx) suspend, and the SOL will be reset to 0 during the suspend process. So, for the dGPU resume case, do you think it's enough to detect the suspend abort event by only checking SOL without any Sx filter? > For S0ix, I don't think there is a requirement to turn off all endpoints - for dGPUs that don't support D3 or runpm etc. Then even if S0ix entry got aborted, but GPU is suspended properly there is no need to reset the device. Thanks, Lijo >>> Thanks, >>> Prike >>>> >>>>> sol_reg1 = RREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_81); >>>>> msleep(100); >>>>> sol_reg2 = RREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_81);