On 10/3/2023 16:11, Deucher, Alexander wrote:
[Public]
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Mario
Limonciello
Sent: Tuesday, October 3, 2023 4:55 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Wentland, Harry <Harry.Wentland@xxxxxxx>; Limonciello, Mario
<Mario.Limonciello@xxxxxxx>
Subject: [PATCH v3 1/4] drm/amd: Add support for prepare() and complete()
callbacks
Linux PM core has a prepare() callback run before suspend and complete()
callback ran after resume() for devices to use. Add plumbing to bring
prepare() to amdgpu.
The idea with the new vfuncs for amdgpu is that all IP blocks that memory
allocations during suspend should do the allocation from this call instead of
the suspend() callback.
By moving the allocations to prepare() the system suspend will be failed before
any IP block has done any suspend code.
If the suspend fails, then do any cleanups in the complete() callback.
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 39
++++++++++++++++++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 +++---
3 files changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 73e825d20259..5d651552822c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1415,6 +1415,8 @@ void amdgpu_driver_postclose_kms(struct
drm_device *dev, void amdgpu_driver_release_kms(struct drm_device *dev);
int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
+int amdgpu_device_prepare(struct drm_device *dev); void
+amdgpu_device_complete(struct drm_device *dev);
int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int
amdgpu_device_resume(struct drm_device *dev, bool fbcon);
u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bad2b5577e96..f53cf675c3ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4259,6 +4259,43 @@ static int amdgpu_device_evict_resources(struct
amdgpu_device *adev)
/*
* Suspend & resume.
*/
+/**
+ * amdgpu_device_prepare - prepare for device suspend
+ *
+ * @dev: drm dev pointer
+ *
+ * Prepare to put the hw in the suspend state (all asics).
+ * Returns 0 for success or an error on failure.
+ * Called at driver suspend.
+ */
+int amdgpu_device_prepare(struct drm_device *dev) {
+ struct amdgpu_device *adev = drm_to_adev(dev);
+ int r;
+
+ if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+ return 0;
+
+ adev->in_suspend = true;
+
+ return 0;
+}
+
+/**
+ * amdgpu_device_complete - complete the device after resume
+ *
+ * @dev: drm dev pointer
+ *
+ * Clean up any actions that the prepare step did.
+ * Called after driver resume.
+ */
+void amdgpu_device_complete(struct drm_device *dev) {
+ struct amdgpu_device *adev = drm_to_adev(dev);
+
+ adev->in_suspend = false;
+}
+
/**
* amdgpu_device_suspend - initiate device suspend
*
@@ -4277,8 +4314,6 @@ int amdgpu_device_suspend(struct drm_device
*dev, bool fbcon)
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
return 0;
- adev->in_suspend = true;
-
We also set this to false in amdgpu_device_resume() so that should be fixed up as well. But, I'm not sure we want to move this out of amdgpu_device_suspend(). There are places we use amdgpu_device_suspend/resume() outside of pmops that also rely on these being set. Those places may need to be fixed up if we do. IIRC, the switcheroo code uses this.
The big reason that I moved it from suspend() to prepare() was so that
amdgpu_device_evict_resources() was called with the context of it being set.
My thought process:
0) prepare() sets all the time
1) If prepare() fails complete() clears it.
2) If prepare() succeeds it remains set for suspend()
3) If suspend() succeeds it gets cleared at resume()
4) If resume() failed for some reason, it's cleared by complete().
Does it actually matter that it's set while evicting resources?
If so, maybe I'll just have both prepare() and suspend() set it
universally and both resume() and complete() clear it universally.
Alex
/* Evict the majority of BOs before grabbing the full access */
r = amdgpu_device_evict_resources(adev);
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e3471293846f..4c6fb852516a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2425,8 +2425,9 @@ static int amdgpu_pmops_prepare(struct device
*dev)
/* Return a positive number here so
* DPM_FLAG_SMART_SUSPEND works properly
*/
- if (amdgpu_device_supports_boco(drm_dev))
- return pm_runtime_suspended(dev);
+ if (amdgpu_device_supports_boco(drm_dev) &&
+ pm_runtime_suspended(dev))
+ return 1;
/* if we will not support s3 or s2i for the device
* then skip suspend
@@ -2435,12 +2436,14 @@ static int amdgpu_pmops_prepare(struct device
*dev)
!amdgpu_acpi_is_s3_active(adev))
return 1;
- return 0;
+ return amdgpu_device_prepare(drm_dev);
}
static void amdgpu_pmops_complete(struct device *dev) {
- /* nothing to do */
+ struct drm_device *drm_dev = dev_get_drvdata(dev);
+
+ amdgpu_device_complete(drm_dev);
}
static int amdgpu_pmops_suspend(struct device *dev)
--
2.34.1