On 11/26/24 09:37, Lazar, Lijo wrote:
On 11/26/2024 7:50 PM, Mario Limonciello wrote:
Based on https://gitlab.freedesktop.org/drm/amd/-/issues/3781
What if this callback takes care only of suspend case and leaves the
hibernate case to dpm_prepare callback?
Then hibernate would fail under memory pressure.
My take is this failure with hibernate is a userspace problem (whether
userspace decides to freeze the tasks or not). I think it's better that
we /try/ to do the eviction and notify them if userspace should be
changed.
Hmm, the logic is kind of inconsistent now.
For dGPUs, evict is required for s0ix, s3, s4 and attempted twice.
For APUs, evict is required for s4, but attempted only once.
Well the logic was inconsistent before already.
But FWIW it will still be attempted a second time for APUs.
The call sequence is:
1) PM notifier
amdgpu_device_pm_notifier()
->amdgpu_device_evict_resources()
Eviction for dGPU and APU in S4.
2) Prepare callback
amdgpu_pmops_prepare()
->amdgpu_device_prepare()
->->amdgpu_device_evict_resources()
Eviction for dGPU only.
3) Suspend callback
amdgpu_pmops_freeze()
->amdgpu_device_suspend()
->->amdgpu_device_evict_resources()
Eviction for dGPU and APU in S4.
So yes it's incongruent that prepare() doesn't include APU S4, but the
problem is you won't know at prepare() time whether it's S4 AFAICT.
Or is there way to tell at prepare() for S4?
Now this notifier comes in before that and tells that right?
Oh yeah, it does!
In that case we can set adev-in_s4 during the PM notifier and keep it
set unless there is a failure during prepare().
BTW, I didn't realize that there is a max of 3 attempts now (didn't
notice it being called in suspend also).
Yeah, there are other reasons suspend callback gets called so it's
better to keep that sequence identical.
Thanks,
Lijo
Thanks,
Lijo
Thanks,
Lijo
+ case PM_SUSPEND_PREPARE:
+ r = amdgpu_device_evict_resources(adev);
+ adev->in_s4 = false;
+ /*
+ * This is considered non-fatal at thie time because
+ * amdgpu_device_prepare() will also evict resources.
+ * See https://gitlab.freedesktop.org/drm/amd/-/issues/3781
+ */
+ if (r)
+ drm_warn(adev_to_drm(adev), "Failed to evict resources,
freeze active processes if problems occur\n");
+ break;
+ }
+
+ return NOTIFY_DONE;
+}
+
/**
* amdgpu_device_prepare - prepare for device suspend
*