Re: [PATCH] drm/ttm: Drain workqueue before sys manager release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16.10.2023 13:03, Christian König wrote:
Am 13.10.23 um 16:34 schrieb Karolina Stolarek:
In rare cases, a delayed destruction of a BO with a system resource
could stay in the workqueue until drain_workqueue() is called in
ttm_device_fini(). An attempt to free a resource from an already
released manager results in NULL pointer dereference. Move the step
of draining and destroying the workqueue so it happens before the
ttm_sys_manager cleanup.

Good catch! But it's not only the drainage of the workqueue which is at the wrong place here.

Removing the device from the device list should come before destroying the system domain as well.

I wonder when this would cause problems. But of course, it makes sense
to fix both with one patch.

So I think you should probably rather move the chunk:

man = ttm_manager_type(bdev, TTM_PL_SYSTEM); ttm_resource_manager_set_used(man, false); ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);

after the destroy_workqueue() instead of the other way around.>
Apart from that looks good to me.

Thanks for taking a look and the suggestion, I'll send a v2 once I
finish testing the change. Also, I'll drop the Fixes tag, as it seems
the order issue has been there since the very beginning.

All the best,
Karolina


Thanks, Christian.


Fixes: 9bff18d13473 ("drm/ttm: use per BO cleanup workers") Signed-off-by: Karolina Stolarek <karolina.stolarek@xxxxxxxxx> --- Some background: I stumbled upon this issue when testing ttm_bo_pipeline_gutting() with BO with an active dma_resv fence. In ~2% of the runs, the delayed destruction of the ghost wouldn't happen until the drain_queue() step. man->func->free(man, *res) got called via ttm_bo_cleanup_memtype_use(), the manager and its functions were nowhere to be seen, resulting in a nulptr deref.

drivers/gpu/drm/ttm/ttm_device.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index
7726a72befc5..753126581620 100644 ---
a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -232,6 +232,9 @@ void ttm_device_fini(struct ttm_device *bdev) struct ttm_resource_manager *man; unsigned i; + drain_workqueue(bdev->wq);
+    destroy_workqueue(bdev->wq); + man = ttm_manager_type(bdev,
TTM_PL_SYSTEM); ttm_resource_manager_set_used(man, false); ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL); @@ -240,9 +243,6 @@ void ttm_device_fini(struct ttm_device *bdev) list_del(&bdev->device_list); mutex_unlock(&ttm_global_mutex); - drain_workqueue(bdev->wq); - destroy_workqueue(bdev->wq); - spin_lock(&bdev->lru_lock); for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) if (list_empty(&man->lru[0]))




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux