Re: [PATCH] Fix Incorrect VMIDs passed to HWS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 2022-03-17 um 16:57 schrieb Tushar Patel:
Removed dev_error message for incorrect VMIDs

Fix Incorrect VMIDs passed to HWS

This could use more of an explanation. The problem here was, that the previous default was based on an outdated number of VMIDs. On Arcturus and Aldebaran we reserve more VMIDs for KFD. That was never reflected in the maximum concurrency setting for HWS. This patch fixes that by making the default dependent on the number of VMIDs per GPU.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 +++---------
  2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 4c20c23d6ba0..bda1b5132ee8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -680,7 +680,7 @@ MODULE_PARM_DESC(sched_policy,
   * Maximum number of processes that HWS can schedule concurrently. The maximum is the
   * number of VMIDs assigned to the HWS, which is also the default.
   */
-int hws_max_conc_proc = 8;
+int hws_max_conc_proc = -1;
  module_param(hws_max_conc_proc, int, 0444);
  MODULE_PARM_DESC(hws_max_conc_proc,
  	"Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 339e12c94cff..66074e1abc79 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -483,15 +483,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
  	}
/* Verify module parameters regarding mapped process number*/
-	if ((hws_max_conc_proc < 0)
-			|| (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) {
-		dev_err(kfd_device,
-			"hws_max_conc_proc %d must be between 0 and %d, use %d instead\n",
-			hws_max_conc_proc, kfd->vm_info.vmid_num_kfd,
-			kfd->vm_info.vmid_num_kfd);
-		kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
-	} else
-		kfd->max_proc_per_quantum = hws_max_conc_proc;
+	kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;

I'd move that into an else-branch of the if-statement. That would make the logic clearer.


+	if (hws_max_conc_proc != -1)

Change this condition to "hws_max_conc_proc >= 0". We never want to set kfd->max_proc_per_quantum to something negative.

With those issues fixed, the patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>


+		kfd->max_proc_per_quantum = min(hws_max_conc_proc, kfd->vm_info.vmid_num_kfd)
/* calculate max size of mqds needed for queues */
  	size = max_num_of_queues_per_device *



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux