Re: [PATCH v3 1/4] drm/amd: Add support for prepare() and complete() callbacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/5/2023 07:35, Christian König wrote:
Am 04.10.23 um 05:39 schrieb Mario Limonciello:
On 10/3/2023 16:22, Deucher, Alexander wrote:
[Public]

-----Original Message-----
From: Limonciello, Mario <Mario.Limonciello@xxxxxxx>
Sent: Tuesday, October 3, 2023 5:17 PM
To: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; amd-
gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Wentland, Harry <Harry.Wentland@xxxxxxx>
Subject: Re: [PATCH v3 1/4] drm/amd: Add support for prepare() and
complete() callbacks

On 10/3/2023 16:11, Deucher, Alexander wrote:
[Public]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
Mario Limonciello
Sent: Tuesday, October 3, 2023 4:55 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Wentland, Harry <Harry.Wentland@xxxxxxx>; Limonciello, Mario
<Mario.Limonciello@xxxxxxx>
Subject: [PATCH v3 1/4] drm/amd: Add support for prepare() and
complete() callbacks

Linux PM core has a prepare() callback run before suspend and
complete() callback ran after resume() for devices to use.  Add
plumbing to bring
prepare() to amdgpu.

The idea with the new vfuncs for amdgpu is that all IP blocks that
memory allocations during suspend should do the allocation from this
call instead of the suspend() callback.

By moving the allocations to prepare() the system suspend will be
failed before any IP block has done any suspend code.

If the suspend fails, then do any cleanups in the complete() callback.

Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 39
++++++++++++++++++++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 11 +++---
   3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 73e825d20259..5d651552822c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1415,6 +1415,8 @@ void amdgpu_driver_postclose_kms(struct
drm_device *dev,  void amdgpu_driver_release_kms(struct drm_device
*dev);

   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
+int amdgpu_device_prepare(struct drm_device *dev); void
+amdgpu_device_complete(struct drm_device *dev);
   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon); int
amdgpu_device_resume(struct drm_device *dev, bool fbcon);
   u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc); diff
--git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bad2b5577e96..f53cf675c3ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4259,6 +4259,43 @@ static int
amdgpu_device_evict_resources(struct
amdgpu_device *adev)
   /*
    * Suspend & resume.
    */
+/**
+ * amdgpu_device_prepare - prepare for device suspend
+ *
+ * @dev: drm dev pointer
+ *
+ * Prepare to put the hw in the suspend state (all asics).
+ * Returns 0 for success or an error on failure.
+ * Called at driver suspend.
+ */
+int amdgpu_device_prepare(struct drm_device *dev) {
+     struct amdgpu_device *adev = drm_to_adev(dev);
+     int r;
+
+     if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+             return 0;
+
+     adev->in_suspend = true;
+
+     return 0;
+}
+
+/**
+ * amdgpu_device_complete - complete the device after resume
+ *
+ * @dev: drm dev pointer
+ *
+ * Clean up any actions that the prepare step did.
+ * Called after driver resume.
+ */
+void amdgpu_device_complete(struct drm_device *dev) {
+     struct amdgpu_device *adev = drm_to_adev(dev);
+
+     adev->in_suspend = false;
+}
+
   /**
    * amdgpu_device_suspend - initiate device suspend
    *
@@ -4277,8 +4314,6 @@ int amdgpu_device_suspend(struct drm_device
*dev, bool fbcon)
        if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
                return 0;

-     adev->in_suspend = true;
-

We also set this to false in amdgpu_device_resume() so that should be fixed
up as well.  But, I'm not sure we want to move this out of
amdgpu_device_suspend().  There are places we use
amdgpu_device_suspend/resume() outside of pmops that also rely on these
being set.  Those places may need to be fixed up if we do. IIRC, the switcheroo
code uses this.

The big reason that I moved it from suspend() to prepare() was so that
amdgpu_device_evict_resources() was called with the context of it being set.

My thought process:
0) prepare() sets all the time
1) If prepare() fails complete() clears it.
2) If prepare() succeeds it remains set for suspend()
3) If suspend() succeeds it gets cleared at resume()
4) If resume() failed for some reason, it's cleared by complete().

Does it actually matter that it's set while evicting resources?

Shouldn't matter for evicting resources.  We even have debugfs nodes you can access to forcibly evict resources at runtime for testing memory pressure.

Then in that case I think what I'll do is put an extra call for amdgpu_device_evict_resources() in the prepare callback.

It shouldn't do any harm to call three times in the suspend sequence instead of two.

Yeah, I think you should move the first call from suspend to prepare.

Evacuating VRAM doesn't rely at all on the suspend flag, but the second evacuation only works after we turned of the hardware.

In the v4 of the series I've instead added an extra call to evict in prepare(). So it's 1x call in prepare() and then 2x call in suspend().


Regards,
Christian.



Alex

If so, maybe I'll just have both prepare() and suspend() set it universally and
both resume() and complete() clear it universally.



Alex

        /* Evict the majority of BOs before grabbing the full access */
        r = amdgpu_device_evict_resources(adev);
        if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e3471293846f..4c6fb852516a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2425,8 +2425,9 @@ static int amdgpu_pmops_prepare(struct device
*dev)
        /* Return a positive number here so
         * DPM_FLAG_SMART_SUSPEND works properly
         */
-     if (amdgpu_device_supports_boco(drm_dev))
-             return pm_runtime_suspended(dev);
+     if (amdgpu_device_supports_boco(drm_dev) &&
+         pm_runtime_suspended(dev))
+             return 1;

        /* if we will not support s3 or s2i for the device
         *  then skip suspend
@@ -2435,12 +2436,14 @@ static int amdgpu_pmops_prepare(struct
device
*dev)
            !amdgpu_acpi_is_s3_active(adev))
                return 1;

-     return 0;
+     return amdgpu_device_prepare(drm_dev);
   }

   static void amdgpu_pmops_complete(struct device *dev) {
-     /* nothing to do */
+     struct drm_device *drm_dev = dev_get_drvdata(dev);
+
+     amdgpu_device_complete(drm_dev);
   }

   static int amdgpu_pmops_suspend(struct device *dev)
--
2.34.1








[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux