On Tue, Sep 27, 2022 at 1:57 PM Mario Limonciello <mario.limonciello@xxxxxxx> wrote: > > We previously had in place some behavior that would cause APU or GPU > to be reset during suspend entry. This caused various problems, and > eventually was reverted. There are however users that preferred this > behavior, so add an option that they can use to force it. You should never need to reset on suspend unless the platform is not actually going into a low power state. The reason the reset was added was as a workaround for aborted suspends so that we can put the GPU into a known good state if we need to re-initialize the hardware with the device not actually having been powered down. In most cases I've seen where a reset "helps" it's because the GPU was not actually powered off during suspend. Adding a workaround like this will just hide actual bugs or users will read somewhere that they need to set it and it will break suspend for them and then we'll get spurious bug reports. Alex > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216516 > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 2 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 ++++++++ > 3 files changed, 11 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index ae9371b172e3a..85999f48e2835 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -230,6 +230,7 @@ extern bool pcie_p2p; > > extern int amdgpu_tmz; > extern int amdgpu_reset_method; > +extern int amdgpu_reset_on_suspend; > > #ifdef CONFIG_DRM_AMDGPU_SI > extern int amdgpu_si_support; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c > index b14800ac179ee..17c0a0ec21bd6 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c > @@ -1051,6 +1051,8 @@ bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev) > */ > bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev) > { > + if (amdgpu_reset_on_suspend >= 0) > + return !!amdgpu_reset_on_suspend; > if (adev->flags & AMD_IS_APU) > return false; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 16f6a313335e9..6a786dacf2c2d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -179,6 +179,7 @@ int amdgpu_noretry = -1; > int amdgpu_force_asic_type = -1; > int amdgpu_tmz = -1; /* auto */ > int amdgpu_reset_method = -1; /* auto */ > +int amdgpu_reset_on_suspend = -1; /* auto */ > int amdgpu_num_kcq = -1; > int amdgpu_smartshift_bias; > int amdgpu_use_xgmi_p2p = 1; > @@ -870,6 +871,13 @@ module_param_named(tmz, amdgpu_tmz, int, 0444); > MODULE_PARM_DESC(reset_method, "GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = mode1, 3 = mode2, 4 = baco/bamaco)"); > module_param_named(reset_method, amdgpu_reset_method, int, 0444); > > +/** > + * DOC: reset_on_suspend (int) > + * GPUs should be reset on suspend (-1 = auto (default), 0 = no, 1 = yes) > + */ > +MODULE_PARM_DESC(reset_on_suspend, "GPUs should be reset on suspend (-1 = auto (default), 0 = no 1 = yes"); > +module_param_named(reset_on_suspend, amdgpu_reset_on_suspend, int, 0444); > + > /** > * DOC: bad_page_threshold (int) Bad page threshold is specifies the > * threshold value of faulty pages detected by RAS ECC, which may > -- > 2.25.1 >