Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 20.09.24 um 10:57 schrieb Philipp Stanner:
On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote:
Tearing down the scheduler with jobs still on the pending list can
lead to use after free issues. Add a warning if drivers try to
destroy a scheduler which still has work pushed to the HW.
Did you have time yet to look into my proposed waitque-solution?

I don't remember seeing anything. What have I missed?


When there are still entities with jobs the situation is even worse
since the dma_fences for those jobs can never signal we can just
choose between potentially locking up core memory management and
random memory corruption. When drivers really mess it up that well
let them run into a BUG_ON().

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
---
  drivers/gpu/drm/scheduler/sched_main.c | 19 ++++++++++++++++++-
  1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index f093616fe53c..8a46fab5cdc8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler
*sched)
I agree with Sima that it should first be documented in the function's
docstring what the user is expected to have done before calling the
function.

Good point, going to update the documentation as well.

Thanks,
Christian.


P.

  drm_sched_wqueue_stop(sched); + /*
+	 * Tearing down the scheduler wile there are still
unprocessed jobs can
+	 * lead to use after free issues in the scheduler fence.
+	 */
+	WARN_ON(!list_empty(&sched->pending_list));
+
  	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++)
{
  		struct drm_sched_rq *rq = sched->sched_rq[i];
  spin_lock(&rq->lock);
-		list_for_each_entry(s_entity, &rq->entities, list)
+		list_for_each_entry(s_entity, &rq->entities, list) {
+			/*
+			 * The justification for this BUG_ON() is
that tearing
+			 * down the scheduler while jobs are pending
leaves
+			 * dma_fences unsignaled. Since we have
dependencies
+			 * from the core memory management to
eventually signal
+			 * dma_fences this can trivially lead to a
system wide
+			 * stop because of a locked up memory
management.
+			 */
+			BUG_ON(spsc_queue_count(&s_entity-
job_queue));
+
  			/*
  			 * Prevents reinsertion and marks job_queue
as idle,
  			 * it will removed from rq in
drm_sched_entity_fini
  			 * eventually
  			 */
  			s_entity->stopped = true;
+		}
  		spin_unlock(&rq->lock);
  		kfree(sched->sched_rq[i]);
  	}




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux