Re: [PATCH] drm/ttm: Schedule delayed_delete worker closer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-11-07 14:45, Rajneesh Bhardwaj wrote:
When a TTM BO is getting freed, to optimize the clearing operation on
the workqueue, schedule it closer to a NUMA node where the memory was
allocated. This avoids the cases where the ttm_bo_delayed_delete gets
scheduled on the CPU cores that are across interconnect boundaries such
as xGMI, PCIe etc.

This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD
APU platforms such as GFXIP9.4.3.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@xxxxxxx>

Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>


---
  drivers/gpu/drm/ttm/ttm_bo.c     | 10 +++++++++-
  drivers/gpu/drm/ttm/ttm_device.c |  3 ++-
  2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 5757b9415e37..0d608441a112 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -370,7 +370,15 @@ static void ttm_bo_release(struct kref *kref)
  			spin_unlock(&bo->bdev->lru_lock);
INIT_WORK(&bo->delayed_delete, ttm_bo_delayed_delete);
-			queue_work(bdev->wq, &bo->delayed_delete);
+			/* Schedule the worker on the closest NUMA node, if no
+			 * CPUs are available, this falls back to any CPU core
+			 * available system wide. This helps avoid the
+			 * bottleneck to clear memory in cases where the worker
+			 * is scheduled on a CPU which is remote to the node
+			 * where the memory is getting freed.
+			 */
+
+			queue_work_node(bdev->pool.nid, bdev->wq, &bo->delayed_delete);
  			return;
  		}
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 43e27ab77f95..72b81a2ee6c7 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -213,7 +213,8 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
  	bdev->funcs = funcs;
ttm_sys_man_init(bdev);
-	ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
+
+	ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
bdev->vma_manager = vma_manager;
  	spin_lock_init(&bdev->lru_lock);



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux