RE: [PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete

"Liang, Prike" <Prike.Liang@xxxxxxx> · Thu, 24 Oct 2024 07:36:20 +0000

[AMD Official Use Only - AMD Internal Distribution Only]

> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> Sent: Thursday, October 24, 2024 3:30 PM
> To: Liang, Prike <Prike.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete
>
>
>
> On 10/24/2024 12:49 PM, Lazar, Lijo wrote:
> >
> >
> > On 10/24/2024 12:23 PM, Liang, Prike wrote:
> >> [AMD Official Use Only - AMD Internal Distribution Only]
> >>
> >>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> >>> Sent: Thursday, October 24, 2024 11:39 AM
> >>> To: Liang, Prike <Prike.Liang@xxxxxxx>;
> >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> >>> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
> >>> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the
> >>> suspend_complete
> >>>
> >>>
> >>>
> >>> On 10/24/2024 8:24 AM, Liang, Prike wrote:
> >>>> [Public]
> >>>>
> >>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
> >>>>> Sent: Wednesday, October 23, 2024 6:55 PM
> >>>>> To: Liang, Prike <Prike.Liang@xxxxxxx>;
> >>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> >>>>> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
> >>>>> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the
> >>>>> suspend_complete
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 10/14/2024 1:19 PM, Prike Liang wrote:
> >>>>>> To check the status of S3 suspend completion, use the PM core
> >>>>>> pm_suspend_global_flags bit(1) to detect S3 abort events.
> >>>>>> Therefore, clean up the AMDGPU driver's private flag suspend_complete.
> >>>>>>
> >>>>>> Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx>
> >>>>>> ---
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 2 --
> >>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 --
> >>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 ++--
> >>>>>>  drivers/gpu/drm/amd/amdgpu/soc15.c      | 7 ++-----
> >>>>>>  drivers/gpu/drm/amd/amdgpu/soc21.c      | 2 +-
> >>>>>>  5 files changed, 5 insertions(+), 12 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>> index 48c9b9b06905..9b35763ae0a7 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>>>> @@ -1111,8 +1111,6 @@ struct amdgpu_device {
> >>>>>>     bool                            in_s3;
> >>>>>>     bool                            in_s4;
> >>>>>>     bool                            in_s0ix;
> >>>>>> -   /* indicate amdgpu suspension status */
> >>>>>> -   bool                            suspend_complete;
> >>>>>>
> >>>>>>     enum pp_mp1_state               mp1_state;
> >>>>>>     struct amdgpu_doorbell_index doorbell_index; diff --git
> >>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>>> index 680e44fdee6e..78972151b970 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>>> @@ -2501,7 +2501,6 @@ static int amdgpu_pmops_suspend(struct
> >>>>>> device
> >>> *dev)
> >>>>>>     struct drm_device *drm_dev = dev_get_drvdata(dev);
> >>>>>>     struct amdgpu_device *adev = drm_to_adev(drm_dev);
> >>>>>>
> >>>>>> -   adev->suspend_complete = false;
> >>>>>>     if (amdgpu_acpi_is_s0ix_active(adev))
> >>>>>>             adev->in_s0ix = true;
> >>>>>>     else if (amdgpu_acpi_is_s3_active(adev)) @@ -2516,7 +2515,6
> >>>>>> @@ static int amdgpu_pmops_suspend_noirq(struct device *dev)
> >>>>>>     struct drm_device *drm_dev = dev_get_drvdata(dev);
> >>>>>>     struct amdgpu_device *adev = drm_to_adev(drm_dev);
> >>>>>>
> >>>>>> -   adev->suspend_complete = true;
> >>>>>>     if (amdgpu_acpi_should_gpu_reset(adev))
> >>>>>>             return amdgpu_asic_reset(adev);
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >>>>>> index be320d753507..ba8e66744376 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >>>>>> @@ -3276,8 +3276,8 @@ static int gfx_v9_0_cp_gfx_start(struct
> >>>>>> amdgpu_device
> >>>>> *adev)
> >>>>>>      * confirmed that the APU gfx10/gfx11 needn't such update.
> >>>>>>      */
> >>>>>>     if (adev->flags & AMD_IS_APU &&
> >>>>>> -                   adev->in_s3 && !adev->suspend_complete) {
> >>>>>> -           DRM_INFO(" Will skip the CSB packet resubmit\n");
> >>>>>> +                   adev->in_s3 && !pm_resume_via_firmware()) {
> >>>>>> +           DRM_INFO("Will skip the CSB packet resubmit\n");
> >>>>>>             return 0;
> >>>>>>     }
> >>>>>>     r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 +
> >>>>>> 3); diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>> index 12ff6cf568dc..d9d11131a744 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>> @@ -584,13 +584,10 @@ static bool
> >>>>>> soc15_need_reset_on_resume(struct
> >>>>> amdgpu_device *adev)
> >>>>>>      *    performing pm core test.
> >>>>>>      */
> >>>>>>     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> >>>>>> -                   !pm_resume_via_firmware()) {
> >>>>>> -           adev->suspend_complete = false;
> >>>>>> +                   !pm_resume_via_firmware())
> >>>>>>             return true;
> >>>>>> -   } else {
> >>>>>> -           adev->suspend_complete = true;
> >>>>>> +   else
> >>>>>>             return false;
> >>>>>> -   }
> >>>>>>  }
> >>>>>>
> >>>>>>  static int soc15_asic_reset(struct amdgpu_device *adev) diff
> >>>>>> --git a/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>> index c4b950e75133..7a47a21ef00f 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>> @@ -904,7 +904,7 @@ static bool soc21_need_reset_on_resume(struct
> >>>>> amdgpu_device *adev)
> >>>>>>      * 2) S3 suspend got aborted and TOS is active.
> >>>>>>      */
> >>>>>>     if (!(adev->flags & AMD_IS_APU) && adev->in_s3 &&
> >>>>>> -       !adev->suspend_complete) {
> >>>>>> +       !pm_resume_via_firmware()) {
> >>>>>
> >>>>> Looks like this will cover only ACPI based systems. Not sure if
> >>>>> that assumption is valid for dGPU cases.
> >>>>>
> >>>>> Thanks,
> >>>>> Lijo
> >>>>
> >>>> Yes, the pm_set_resume_via_firmware() function is only called
> >>>> during the
> >>> ACPI_STATE_S3 suspend process. However, ACPI-enabled systems are
> >>> popular in the desktop world. If there are concerns about ACPI
> >>> configuration, one option could be to check if the dGPU needs a
> >>> reset by directly checking the SOL register. As far as I can see,
> >>> when the dGPU completes its suspend process, the SOL value will
> >>> remain zero until the dGPU is resumed. Conversely, in the case of a suspend
> abort, the SOL value will be non-zero.
> >>>>
> >>>
> >>> in_s3 is set for dGPU in case of s0ix as well. Probably, that's the
> >>> only case where need the flag to avoid unnecessary reset. Otherwise SOL
> check could be sufficient.
> >>>
> >>> Thanks,
> >>> Lijo
> >>>
> >> Do you mean we need to include S0ix to reset the dGPU during an S0ix suspend
> abort? However, the in_s0ix state of the dGPU should always be false, and there is
> no specific suspension handler for the dGPU in S0ix. As a PCIe endpoint, the dGPU
> should be powered off during system-wide(S0ix and Sx) suspend, and the SOL will
> be reset to 0 during the suspend process. So, for the dGPU resume case, do you
> think it's enough to detect the suspend abort event by only checking SOL without
> any Sx filter?
> >>
> > For S0ix, I don't think there is a requirement to turn off all
> > endpoints

Yes, that only requires the _DSM listed devices reach its deepest sleep level.

> > - for dGPUs that don't support D3 or runpm etc.
>
> To clarify, "don't support D3" => what I meant is when D3 entry is prevented.

IMO this case can be filtered out by checking the SOL register value.

> Thanks,
> Lijo
>
> Then even if S0ix entry
> > got aborted, but GPU is suspended properly there is no need to reset
> > the device.
> >
> > Thanks,
> > Lijo
> >
> >>>> Thanks,
> >>>> Prike
> >>>>>
> >>>>>>             sol_reg1 = RREG32_SOC15(MP0, 0,
> regMP0_SMN_C2PMSG_81);
> >>>>>>             msleep(100);
> >>>>>>             sol_reg2 = RREG32_SOC15(MP0, 0,
> >>>>>> regMP0_SMN_C2PMSG_81);