On 1/22/2025 8:28 AM, Mario Limonciello wrote: > On 1/20/2025 23:45, Lazar, Lijo wrote: >> >> >> On 1/13/2025 9:10 AM, Jiang Liu wrote: >>> When GPU suspend is aborted, do the same for dGPU as APU to reset >>> soc15 asic. Otherwise it may cause following errors: >>> [ 547.229463] amdgpu 0001:81:00.0: [drm:amdgpu_ring_test_helper >>> [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110) >>> >>> [ 555.126827] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper >>> [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110) >>> [ 555.126901] [drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ >>> enable failed >>> [ 555.126957] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* >>> resume of IP block <gfx_v9_4_3> failed -110 >>> [ 555.126959] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_resume >>> failed (-110). >>> [ 555.126965] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns >>> -110 >>> [ 555.126966] PM: Device 0000:0a:00.0 failed to resume async: error >>> -110 >>> >>> This fix has been tested on Mi308X. >>> >>> Signed-off-by: Jiang Liu <gerry@xxxxxxxxxxxxxxxxx> >>> Tested-by: Shuo Liu <shuox.liu@xxxxxxxxxxxxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++----- >>> 1 file changed, 3 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/ >>> amd/amdgpu/soc15.c >>> index a59b4c36cad7..0e1daefd1a8e 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c >>> @@ -605,12 +605,10 @@ soc15_asic_reset_method(struct amdgpu_device >>> *adev) >>> static bool soc15_need_reset_on_resume(struct amdgpu_device *adev) >>> { >>> /* Will reset for the following suspend abort cases. >>> - * 1) Only reset on APU side, dGPU hasn't checked yet. >>> - * 2) S3 suspend aborted in the normal S3 suspend or >>> - * performing pm core test. >>> + * 1) S3 suspend aborted in the normal S3 suspend >>> + * 2) S3 suspend aborted in performing pm core test. >>> */ >>> - if (adev->flags & AMD_IS_APU && adev->in_s3 && >>> - !pm_resume_via_firmware()) >>> + if (adev->in_s3 && !pm_resume_via_firmware()) >>> return true; >> >> I don't think this can be applied to all environments. For ex: this may >> not be applicable for dGPUs combined with ARM CPUs. > > I looked through amdgpu_acpi cases and I'm not sure I agree with this. > On ARM side amdgpu_acpi_is_s3_active() should never return true because > ARM doesn't support PM_SUSPEND_MEM. > > This means that amdgpu_choose_low_power_state() shouldn't set is_s3 either. > > So I don't think this block will run on ARM side. ARM is not the fundamental issue. It's for any platform which supports suspend/resume outside of ACPI. The original patch is accepted only for APUs for that reason. Thanks, Lijo > > That being said, we might have other more fundamental issues to worry > about with suspend/resume than handling aborted suspend/resume when run > on non-x86 so the whole set of amdgpu suspend/resume code might need to > be revisited.