Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Tue, 12 Nov 2024 16:26:04 +0100

Am 08.11.24 um 05:00 schrieb Dave Airlie:

    
      On Wed, 18 Sept 2024 at 23:48, Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:

      
        Tearing down the scheduler with jobs still on the pending list can
lead to use after free issues. Add a warning if drivers try to
destroy a scheduler which still has work pushed to the HW.

When there are still entities with jobs the situation is even worse
since the dma_fences for those jobs can never signal we can just
choose between potentially locking up core memory management and
random memory corruption. When drivers really mess it up that well
let them run into a BUG_ON().

      
      I've been talking a bit to Phillip about this offline.

I'm not sure we should ever be BUG_ON here, I think it might be an
extreme answer, considering we are saying blocking userspace to let
things finish is bad, I think hitting this would be much worse.
    
    

    Yeah, completely agree that this is the most extreme response.

    

    The problem is that the scheduler doesn't have much of a choice in
    this moment any more. What we can do is the following:

    

    1. Try to block for the queued up jobs in the entities to be
    processed and signaling their hardware fence.

    

        There is no guarantee that this won't deadlock and we are
    potentially blocking a SIGKILL here.

        We already tried it before and that turned out to be quite
    unstable.

    

    2. Signal all pending fences without actually executing anything.

    

        Because of the dma_fence design, pipe-lining work and only
    keeping the latest fence for each context in the containers that can
    potentially lead to random memory corruptions.

    

    3. Don't signal anything, just free up all jobs.

    

        That can trivially result in the core memory management waiting
    forever for things to make progress. Leading to complete system
    stops.

    

    So the choice you have are all really bad for the system as a whole.

    

    

    
      I'd rather we WARN_ON, and consider just saying screw it and block
userspace close.
    
    

    Going to make that a WARN_ON() if you insist, but IIRC Linus or Greg
    once summarized it as "A BUG() is only justified if you prevent even
    worse things from happening". If you ask me that's exactly the case
    here.

    

    
      If we really want to avoid the hang on close possibility (though I'm
mostly fine with that) then I think Sima's option to just keep a
reference on the driver, let userspace close and try and clean things
up on fences in the driver later.

I think this should be at least good for rust lifetimes.
    
    

    The problem with the rust lifetime is that it got the direction of
    the dependency wrong.

    

    A dma_fence represents an operation on the hardware, e.g. a direct
    access to some memory. This means that the dma_fence can only signal
    if the hardware tells the driver that the operation is completed.

    

    If a dma_fence should signals because some object is not referenced
    any more, for example because userspace closed it, than that clearly
    points to a fundamentally broken design.

    

    
      Having an explicit memory leak is bad, having a managed memory leak is
less bad, because at least all the memory is still pointed to by
something and managed, at a guess we'd have to keep the whole driver
and scheduler around, to avoid having to make special free functions.
Unless there is some concept like TTM ghost objects we could get away
with, but I think having to signal the fence means we should keep all
the pieces.
    
    

    Well I think memory leaks are more or less harmless. What we really
    really need to prevent are random memory corruptions.

    

    Those can run undetected under the radar for a very long time and
    cause massive damage without anybody being able to pinpoint where
    corruptions came from.

    

    Regards,

    Christian.

    

    
      
Dave.


      
        Signed-off-by: Christian König <christian.koenig@xxxxxxx>
---
 drivers/gpu/drm/scheduler/sched_main.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index f093616fe53c..8a46fab5cdc8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)

        drm_sched_wqueue_stop(sched);

+       /*
+        * Tearing down the scheduler wile there are still unprocessed jobs can
+        * lead to use after free issues in the scheduler fence.
+        */
+       WARN_ON(!list_empty(&sched->pending_list));
+
        for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
                struct drm_sched_rq *rq = sched->sched_rq[i];

                spin_lock(&rq->lock);
-               list_for_each_entry(s_entity, &rq->entities, list)
+               list_for_each_entry(s_entity, &rq->entities, list) {
+                       /*
+                        * The justification for this BUG_ON() is that tearing
+                        * down the scheduler while jobs are pending leaves
+                        * dma_fences unsignaled. Since we have dependencies
+                        * from the core memory management to eventually signal
+                        * dma_fences this can trivially lead to a system wide
+                        * stop because of a locked up memory management.
+                        */
+                       BUG_ON(spsc_queue_count(&s_entity->job_queue));
+
                        /*
                         * Prevents reinsertion and marks job_queue as idle,
                         * it will removed from rq in drm_sched_entity_fini
                         * eventually
                         */
                        s_entity->stopped = true;
+               }
                spin_unlock(&rq->lock);
                kfree(sched->sched_rq[i]);
        }
--
2.34.1