Re: [PATCH v4 1/3] drm/amd: Evict resources during PM ops prepare() callback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 05.10.23 um 16:59 schrieb Mario Limonciello:
On 10/5/2023 09:39, Christian König wrote:
Am 04.10.23 um 19:18 schrieb Mario Limonciello:
Linux PM core has a prepare() callback run before suspend.

If the system is under high memory pressure, the resources may need
to be evicted into swap instead.  If the storage backing for swap
is offlined during the suspend() step then such a call may fail.

So duplicate this step into prepare() to move evict majority of
resources while leaving all existing steps that put the GPU into a
low power state in suspend().

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 +++++++++++++++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 +++---
  3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d23fb4b5ad95..6643d0ed6b1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1413,6 +1413,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
  void amdgpu_driver_release_kms(struct drm_device *dev);
  int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
+int amdgpu_device_prepare(struct drm_device *dev);
  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
  int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
  u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bad2b5577e96..67acee569c08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4259,6 +4259,31 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
  /*
   * Suspend & resume.
   */
+/**
+ * amdgpu_device_prepare - prepare for device suspend
+ *
+ * @dev: drm dev pointer
+ *
+ * Prepare to put the hw in the suspend state (all asics).
+ * Returns 0 for success or an error on failure.
+ * Called at driver suspend.
+ */
+int amdgpu_device_prepare(struct drm_device *dev)
+{
+    struct amdgpu_device *adev = drm_to_adev(dev);
+    int r;
+
+    if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+        return 0;
+
+    /* Evict the majority of BOs before starting suspend sequence */
+    r = amdgpu_device_evict_resources(adev);
+    if (r)
+        return r;
+
+    return 0;
+}
+
  /**
   * amdgpu_device_suspend - initiate device suspend
   *
@@ -4279,7 +4304,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
      adev->in_suspend = true;
-    /* Evict the majority of BOs before grabbing the full access */
      r = amdgpu_device_evict_resources(adev);
      if (r)
          return r;

I would just completely drop this extra amdgpu_device_evict_resources() call now.

We have a second call which is used to evacuate firmware etc... after the hw has been shut down. That one can't move, but also shouldn't allocate that much memory.


The problem is that amdgpu_device_suspend() is also called from amdgpu_switcheroo_set_state() as well as a bunch of pmops sequences that I don't expect call prepare() like poweroff().

I would think we still want to evict resources at the beginning of amdgpu_device_suspend() for all of those.

So it's an extra call for the prepare() path but it should be harmless.

That's true, but I would rather say that we should then call amdgpu_device_prepare() in those call paths as well.

We might end up putting more stuff into the prepare call than just eviction VRAM.

Regards,
Christian.


Regards,
Christian.

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e3471293846f..175167582db0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2425,8 +2425,9 @@ static int amdgpu_pmops_prepare(struct device *dev)
      /* Return a positive number here so
       * DPM_FLAG_SMART_SUSPEND works properly
       */
-    if (amdgpu_device_supports_boco(drm_dev))
-        return pm_runtime_suspended(dev);
+    if (amdgpu_device_supports_boco(drm_dev) &&
+        pm_runtime_suspended(dev))
+        return 1;
      /* if we will not support s3 or s2i for the device
       *  then skip suspend
@@ -2435,7 +2436,7 @@ static int amdgpu_pmops_prepare(struct device *dev)
          !amdgpu_acpi_is_s3_active(adev))
          return 1;
-    return 0;
+    return amdgpu_device_prepare(drm_dev);
  }
  static void amdgpu_pmops_complete(struct device *dev)






[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux