[AMD Official Use Only - AMD Internal Distribution Only] > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > Sent: Thursday, October 24, 2024 3:30 PM > To: Liang, Prike <Prike.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> > Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete > > > > On 10/24/2024 12:49 PM, Lazar, Lijo wrote: > > > > > > On 10/24/2024 12:23 PM, Liang, Prike wrote: > >> [AMD Official Use Only - AMD Internal Distribution Only] > >> > >>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >>> Sent: Thursday, October 24, 2024 11:39 AM > >>> To: Liang, Prike <Prike.Liang@xxxxxxx>; > >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >>> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> > >>> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the > >>> suspend_complete > >>> > >>> > >>> > >>> On 10/24/2024 8:24 AM, Liang, Prike wrote: > >>>> [Public] > >>>> > >>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >>>>> Sent: Wednesday, October 23, 2024 6:55 PM > >>>>> To: Liang, Prike <Prike.Liang@xxxxxxx>; > >>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >>>>> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> > >>>>> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the > >>>>> suspend_complete > >>>>> > >>>>> > >>>>> > >>>>> On 10/14/2024 1:19 PM, Prike Liang wrote: > >>>>>> To check the status of S3 suspend completion, use the PM core > >>>>>> pm_suspend_global_flags bit(1) to detect S3 abort events. > >>>>>> Therefore, clean up the AMDGPU driver's private flag suspend_complete. > >>>>>> > >>>>>> Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> > >>>>>> --- > >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 -- > >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- > >>>>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++-- > >>>>>> drivers/gpu/drm/amd/amdgpu/soc15.c | 7 ++----- > >>>>>> drivers/gpu/drm/amd/amdgpu/soc21.c | 2 +- > >>>>>> 5 files changed, 5 insertions(+), 12 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>> index 48c9b9b06905..9b35763ae0a7 100644 > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>> @@ -1111,8 +1111,6 @@ struct amdgpu_device { > >>>>>> bool in_s3; > >>>>>> bool in_s4; > >>>>>> bool in_s0ix; > >>>>>> - /* indicate amdgpu suspension status */ > >>>>>> - bool suspend_complete; > >>>>>> > >>>>>> enum pp_mp1_state mp1_state; > >>>>>> struct amdgpu_doorbell_index doorbell_index; diff --git > >>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>> index 680e44fdee6e..78972151b970 100644 > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>> @@ -2501,7 +2501,6 @@ static int amdgpu_pmops_suspend(struct > >>>>>> device > >>> *dev) > >>>>>> struct drm_device *drm_dev = dev_get_drvdata(dev); > >>>>>> struct amdgpu_device *adev = drm_to_adev(drm_dev); > >>>>>> > >>>>>> - adev->suspend_complete = false; > >>>>>> if (amdgpu_acpi_is_s0ix_active(adev)) > >>>>>> adev->in_s0ix = true; > >>>>>> else if (amdgpu_acpi_is_s3_active(adev)) @@ -2516,7 +2515,6 > >>>>>> @@ static int amdgpu_pmops_suspend_noirq(struct device *dev) > >>>>>> struct drm_device *drm_dev = dev_get_drvdata(dev); > >>>>>> struct amdgpu_device *adev = drm_to_adev(drm_dev); > >>>>>> > >>>>>> - adev->suspend_complete = true; > >>>>>> if (amdgpu_acpi_should_gpu_reset(adev)) > >>>>>> return amdgpu_asic_reset(adev); > >>>>>> > >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > >>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > >>>>>> index be320d753507..ba8e66744376 100644 > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > >>>>>> @@ -3276,8 +3276,8 @@ static int gfx_v9_0_cp_gfx_start(struct > >>>>>> amdgpu_device > >>>>> *adev) > >>>>>> * confirmed that the APU gfx10/gfx11 needn't such update. > >>>>>> */ > >>>>>> if (adev->flags & AMD_IS_APU && > >>>>>> - adev->in_s3 && !adev->suspend_complete) { > >>>>>> - DRM_INFO(" Will skip the CSB packet resubmit\n"); > >>>>>> + adev->in_s3 && !pm_resume_via_firmware()) { > >>>>>> + DRM_INFO("Will skip the CSB packet resubmit\n"); > >>>>>> return 0; > >>>>>> } > >>>>>> r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + > >>>>>> 3); diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c > >>>>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c > >>>>>> index 12ff6cf568dc..d9d11131a744 100644 > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > >>>>>> @@ -584,13 +584,10 @@ static bool > >>>>>> soc15_need_reset_on_resume(struct > >>>>> amdgpu_device *adev) > >>>>>> * performing pm core test. > >>>>>> */ > >>>>>> if (adev->flags & AMD_IS_APU && adev->in_s3 && > >>>>>> - !pm_resume_via_firmware()) { > >>>>>> - adev->suspend_complete = false; > >>>>>> + !pm_resume_via_firmware()) > >>>>>> return true; > >>>>>> - } else { > >>>>>> - adev->suspend_complete = true; > >>>>>> + else > >>>>>> return false; > >>>>>> - } > >>>>>> } > >>>>>> > >>>>>> static int soc15_asic_reset(struct amdgpu_device *adev) diff > >>>>>> --git a/drivers/gpu/drm/amd/amdgpu/soc21.c > >>>>>> b/drivers/gpu/drm/amd/amdgpu/soc21.c > >>>>>> index c4b950e75133..7a47a21ef00f 100644 > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c > >>>>>> @@ -904,7 +904,7 @@ static bool soc21_need_reset_on_resume(struct > >>>>> amdgpu_device *adev) > >>>>>> * 2) S3 suspend got aborted and TOS is active. > >>>>>> */ > >>>>>> if (!(adev->flags & AMD_IS_APU) && adev->in_s3 && > >>>>>> - !adev->suspend_complete) { > >>>>>> + !pm_resume_via_firmware()) { > >>>>> > >>>>> Looks like this will cover only ACPI based systems. Not sure if > >>>>> that assumption is valid for dGPU cases. > >>>>> > >>>>> Thanks, > >>>>> Lijo > >>>> > >>>> Yes, the pm_set_resume_via_firmware() function is only called > >>>> during the > >>> ACPI_STATE_S3 suspend process. However, ACPI-enabled systems are > >>> popular in the desktop world. If there are concerns about ACPI > >>> configuration, one option could be to check if the dGPU needs a > >>> reset by directly checking the SOL register. As far as I can see, > >>> when the dGPU completes its suspend process, the SOL value will > >>> remain zero until the dGPU is resumed. Conversely, in the case of a suspend > abort, the SOL value will be non-zero. > >>>> > >>> > >>> in_s3 is set for dGPU in case of s0ix as well. Probably, that's the > >>> only case where need the flag to avoid unnecessary reset. Otherwise SOL > check could be sufficient. > >>> > >>> Thanks, > >>> Lijo > >>> > >> Do you mean we need to include S0ix to reset the dGPU during an S0ix suspend > abort? However, the in_s0ix state of the dGPU should always be false, and there is > no specific suspension handler for the dGPU in S0ix. As a PCIe endpoint, the dGPU > should be powered off during system-wide(S0ix and Sx) suspend, and the SOL will > be reset to 0 during the suspend process. So, for the dGPU resume case, do you > think it's enough to detect the suspend abort event by only checking SOL without > any Sx filter? > >> > > For S0ix, I don't think there is a requirement to turn off all > > endpoints Yes, that only requires the _DSM listed devices reach its deepest sleep level. > > - for dGPUs that don't support D3 or runpm etc. > > To clarify, "don't support D3" => what I meant is when D3 entry is prevented. IMO this case can be filtered out by checking the SOL register value. > Thanks, > Lijo > > Then even if S0ix entry > > got aborted, but GPU is suspended properly there is no need to reset > > the device. > > > > Thanks, > > Lijo > > > >>>> Thanks, > >>>> Prike > >>>>> > >>>>>> sol_reg1 = RREG32_SOC15(MP0, 0, > regMP0_SMN_C2PMSG_81); > >>>>>> msleep(100); > >>>>>> sol_reg2 = RREG32_SOC15(MP0, 0, > >>>>>> regMP0_SMN_C2PMSG_81);