[PATCH] drm/ttm: Drain workqueue before sys manager release

Karolina Stolarek <karolina.stolarek@xxxxxxxxx> · Fri, 13 Oct 2023 16:34:23 +0200

In rare cases, a delayed destruction of a BO with a system resource
could stay in the workqueue until drain_workqueue() is called
in ttm_device_fini(). An attempt to free a resource from an already
released manager results in NULL pointer dereference. Move the step
of draining and destroying the workqueue so it happens before
the ttm_sys_manager cleanup.

Fixes: 9bff18d13473 ("drm/ttm: use per BO cleanup workers")
Signed-off-by: Karolina Stolarek <karolina.stolarek@xxxxxxxxx>
---
Some background: I stumbled upon this issue when testing
ttm_bo_pipeline_gutting() with BO with an active dma_resv fence. In ~2% of
the runs, the delayed destruction of the ghost wouldn't happen until the
drain_queue() step. man->func->free(man, *res) got called via
ttm_bo_cleanup_memtype_use(), the manager and its functions were nowhere to
be seen, resulting in a nulptr deref.

 drivers/gpu/drm/ttm/ttm_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 7726a72befc5..753126581620 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -232,6 +232,9 @@ void ttm_device_fini(struct ttm_device *bdev)
 	struct ttm_resource_manager *man;
 	unsigned i;
 
+	drain_workqueue(bdev->wq);
+	destroy_workqueue(bdev->wq);
+
 	man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
 	ttm_resource_manager_set_used(man, false);
 	ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
@@ -240,9 +243,6 @@ void ttm_device_fini(struct ttm_device *bdev)
 	list_del(&bdev->device_list);
 	mutex_unlock(&ttm_global_mutex);
 
-	drain_workqueue(bdev->wq);
-	destroy_workqueue(bdev->wq);
-
 	spin_lock(&bdev->lru_lock);
 	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
 		if (list_empty(&man->lru[0]))
-- 
2.25.1