Re: [Patch v3 3/4] drm/amdkfd: refactor runtime pm for baco

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



One more nit-pick and one error-handling problem inline.

On 2020-02-06 7:09 p.m., Rajneesh Bhardwaj wrote:
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

  - auto resume amdgpu driver whenever a client requests compute service
  - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.

Reviewed-by: Oak Zeng <oak.zeng@xxxxxxx>
Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 12 +++----
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  8 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +--
  drivers/gpu/drm/amd/amdkfd/kfd_device.c    | 29 +++++++++-------
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h      |  1 +
  drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 40 ++++++++++++++++++++--
  6 files changed, 68 insertions(+), 26 deletions(-)

[snip]
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 98dcbb96b2e2..6d6c25fe2677 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -31,6 +31,7 @@
  #include <linux/compat.h>
  #include <linux/mman.h>
  #include <linux/file.h>
+#include <linux/pm_runtime.h>
  #include "amdgpu_amdkfd.h"
  #include "amdgpu.h"
@@ -527,6 +528,16 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
  		kfree(pdd->qpd.doorbell_bitmap);
  		idr_destroy(&pdd->alloc_idr);
+ /*
+		 * before destroying pdd, make sure to report availability
+		 * for auto suspend
+		 */
+		if (pdd->runtime_inuse) {
+			pm_runtime_mark_last_busy(pdd->dev->ddev->dev);
+			pm_runtime_put_autosuspend(pdd->dev->ddev->dev);
+			pdd->runtime_inuse = false;
+		}
+
  		kfree(pdd);
  	}
  }
@@ -844,6 +855,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
  	pdd->process = p;
  	pdd->bound = PDD_UNBOUND;
  	pdd->already_dequeued = false;
+	pdd->runtime_inuse = false;
  	list_add(&pdd->per_device_list, &p->per_device_data);
/* Init idr used for memory handle translation */
@@ -933,15 +945,39 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
  		return ERR_PTR(-ENOMEM);
  	}
+ /*
+	 * signal runtime-pm system to auto resume and prevent
+	 * further runtime suspend once device pdd is created until
+	 * pdd is destroyed.
+	 */
+	if (!pdd->runtime_inuse) {
+		err = pm_runtime_get_sync(dev->ddev->dev);
+		if (err < 0)
+			return ERR_PTR(err);
+	}
+
  	err = kfd_iommu_bind_process_to_device(pdd);
  	if (err)
-		return ERR_PTR(err);
+		goto out;
err = kfd_process_device_init_vm(pdd, NULL);
  	if (err)
-		return ERR_PTR(err);
+		goto out;
+
+	if (!err)

This "if" is also redundant. If there was an error, you already did goto out. pdd->runtime_inuse should be set whenever we return successfully from this function, so logically there should be no extra "if".


+		/*
+		 * make sure that runtime_usage counter is incremented
+		 * just once per pdd
+		 */
+		pdd->runtime_inuse = true;
return pdd;
+
+out:
+	/* balance runpm reference count and exit with error */

I think you need an "if (!pdd->runtime_inuse)" here. If this function didn't call pm_runtime_get_sync above, you shouldn't do the cleanup below. Otherwise you risk getting unbalanced usage counters. In other words, you need to use the same condition for pm_runtime_get_sync and the cleanup.

Regards,
  Felix


+	pm_runtime_mark_last_busy(dev->ddev->dev);
+	pm_runtime_put_autosuspend(dev->ddev->dev);
+	return ERR_PTR(err);
  }
struct kfd_process_device *kfd_get_first_process_device_data(
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux