RE: [v3] drm/amdgpu: reset asic after system-wide suspend aborted (v3)

"Liang, Prike" <Prike.Liang@xxxxxxx> · Fri, 3 Dec 2021 13:42:48 +0000



[Public]

> -----Original Message-----
> From: Alex Deucher <alexdeucher@xxxxxxxxx>
> Sent: Thursday, December 2, 2021 3:39 AM
> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>
> Cc: Liang, Prike <Prike.Liang@xxxxxxx>; amd-gfx list <amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx>; Deucher, Alexander
> <Alexander.Deucher@xxxxxxx>; Lazar, Lijo <Lijo.Lazar@xxxxxxx>; Huang,
> Ray <Ray.Huang@xxxxxxx>
> Subject: Re: [v3] drm/amdgpu: reset asic after system-wide suspend aborted
> (v3)
>
> On Wed, Dec 1, 2021 at 1:46 PM Limonciello, Mario
> <mario.limonciello@xxxxxxx> wrote:
> >
> > On 11/24/2021 23:48, Prike Liang wrote:
> > > Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
> > > suspend to keep AMDGPU in a clean reset state and that can avoid
> > > re-initialize device improperly error. Currently,we just always do
> > > asic reset in the amdgpu resume until sort out the PM abort case.
> > >
> > > v2: Remove incomplete PM abort flag and add GPU hive case check for
> > > GPU reset.
> > >
> > > v3: Some dGPU reset method not support at the early resume time and
> > > temprorary skip the dGPU case.
> > >
> > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
> > >   1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 7d4115d..f6e1a6a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct
> drm_device *dev, bool fbcon)
> > >       if (adev->in_s0ix)
> > >               amdgpu_gfx_state_change_set(adev,
> > > sGpuChangeState_D0Entry);
> > >
> > > +     /*TODO: In order to not let all-always asic reset affect resume latency
> > > +      * need sort out the case which really need asic reset in the resume
> process.
> > > +      * As to the known issue on the system suspend abort behind the
> AMDGPU suspend,
> > > +      * may can sort this case by checking struct suspend_stats which need
> exported
> > > +      * firstly.
> > > +      */
> > > +     if (adev->flags & AMD_IS_APU)
> > > +             amdgpu_asic_reset(adev);
> >
> > Ideally you only want this to happen on S3 right?  So shouldn't there
> > be an extra check for `amdgpu_acpi_is_s0ix_active`?
>
> Shouldn't matter on the resume side.  Only the suspend side.  If we reset in
> suspend, we'd end up disabling gfxoff.  On the resume side, it should safe,
> but the resume paths for various IPs probably are not adequate to deal with
> a reset for S0i3 since they don't re-init as much hardware.  So it's probably
> better to skip this for S0i3.
>
> Alex
>
There's may some IP suspend/resume sequence not program correctly and when abort the s2idle after AMDGPU suspend can see some other error in the SMU/SDMA resume. However, that seems different symptom in the S3 abort case and will handle it separately.

>
> >
> > >       /* post card */
> > >       if (amdgpu_device_need_post(adev)) {
> > >               r = amdgpu_device_asic_init(adev);
> > >
> >