Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini

Matthew Brost <matthew.brost@xxxxxxxxx> · Thu, 21 Nov 2024 10:10:29 -0800

On Thu, Nov 21, 2024 at 10:06:54AM +0100, Philipp Stanner wrote:
> +CC Thomas Hellström
> 
> 
> On Mon, 2024-11-18 at 12:48 +0100, Christian König wrote:
> >  Am 15.11.24 um 17:07 schrieb Philipp Stanner:
> >  
> > >  
> > > On Fri, 2024-11-15 at 15:39 +0100, Christian König wrote:
> > >  
> > > >  
> > > >  Am 15.11.24 um 14:55 schrieb Philipp Stanner:
> > > >  
> > > >  
> > > > >  
> > > > > [SNIP] 
> > > > >  
> > > >  
> > >  
> > > >  
> > > > >  
> > > > > But you wouldn't want to aim for getting rid of struct
> > > > > drm_sched_job
> > > > > completely, or would you?
> > > > >  
> > > > >  
> > > >  
> > > >  
> > > >  No, the drm_sched_job structure was added to aid the single
> > > > producer
> > > > single consumer queue and so made it easier to put the jobs into
> > > > a
> > > > container.
> > > >  
> > > >  
> > > >  In my mind the full solution for running jobs looks like this:
> > > >  
> > > >  1. Driver enqueues with drm_sched_job_arm()
> > > >  2. job ends up in pending_list
> > > >  3. Sooner or later scheduler calls run_job()
> > > >  4. In return scheduler gets a dma_fence representing the
> > > > resulting
> > > > hardware operation.
> > > >  5, This fence is put into a container to keep around what the hw
> > > > is
> > > > actually executing.
> > > >  6. At some point the fence gets signaled informing the scheduler
> > > > that the next job can be pushed if enough credits are now
> > > > available.
> > > >  
> > > >  There is no free_job callback any more because after run_job is
> > > > called the scheduler is done with the job and only the dma_fence
> > > > which represents the actually HW operation is the object the
> > > > scheduler now works with.
> > > >  
> > > >  We would still need something like a kill_job callback which is
> > > > used
> > > > when an entity is released without flushing all jobs (see
> > > > drm_sched_entity_kill()), but that is then only used for a
> > > > specific
> > > > corner case.
> > > >  
> > >  
> > > Can't we just limit the scheduler's responsibility to telling the
> > > driver that it has to flush, and if it doesn't it's a bug?
> > >  
> >  
> >  We still need to remove the pending jobs from the scheduler if
> > flushing times out.
> >  
> >  Without timing out flushing and/or aborting when the process was
> > killed we run into the same problem again that we don't want ti block
> > on _fini().
> >    
> > >  
> > > >  
> > > >  Blocking for the cleanup in drm_sched_fini() then becomes a
> > > > trivial
> > > > dma_fence_wait() on the remaining unsignaled HW fences and
> > > > eventually
> > > > on the latest killed fence.
> > > >  
> > >  
> > > But that results in exactly the same situation as my waitque
> > > solution,
> > > doesn't it?
> > >  
> >  
> >  The key part is that dma_fence's have a documented requirement to
> > never block infinitely.
> >  
> >  
> > >  
> > > Blocking infinitely in drm_sched_fini():
> > > 
> > > If the driver doesn't guarantee that all fences will get signaled,
> > > then
> > > wait_event() in the waitque solution will block forever. The same
> > > with
> > > dma_fence_wait().
> > > 
> > > Or are you aiming at an interruptible dma_fence_wait(), thereby not
> > > blocking SIGKILL?
> > >  
> >  
> >  That is basically what drm_sched_entity_flush() is already doing.
> >  
> >  
> > >  
> > > That then would still result in leaks. So basically the same
> > > situation
> > > as with an interruptible drm_sched_flush() function.
> > > 
> > > (Although I agree that would probably be more elegant)
> > >  
> >  
> >  If the wait_event really would just waiting for dma_fences then yes.
> >  
> >  The problem with the waitqueue approach is that we need to wait for
> > the free_job callback and that callback is normally called from a
> > work item without holding any additional locks.
> >  
> >  When drm_sched_fini starts to block for that we create a rat-tail of
> > new dependencies since that one is most likely called from a file
> > descriptor destruction.
> >   
> >  
> > >  
> > > 
> > >  
> > > >  
> > > >  
> > > >  The problem with this approach is that the idea of re-submitting
> > > > jobs in a GPU reset by the scheduler is then basically dead. But
> > > > to
> > > > be honest that never worked correctly.
> > > >  
> > > >  See the deprecated warning I already put on
> > > > drm_sched_resubmit_jobs(). The job lifetime is not really well
> > > > defined because of this, see the free_guilty hack as well.
> > > >  
> > >  
> > > drm_sched_resubmit_jobs() is being used by 5 drivers currently. So
> > > if
> > > we go for this approach we have to port them, first. That doesn't
> > > sound
> > > trivial to me
> > >  
> >  
> >  Yeah, completely agree. I think the scheduler should provide
> > something like drm_sched_for_each_pending_fence() which helps to
> > iterate over all the pending HW fences.
> >  
> >  The individual drivers could then device by themself what to do,
> > e.g. upcast the HW fence into the job and push it again or similar.
> 
> I have talked with Dave and we would be interested in exploring the
> direction of getting rid of backend_ops.free_job() and doing the
> modifications this implies.
> 

Sorry again, catching up late.

Got any details to dropping free_job?

Gut reaction is this is fairly large subsystem rework, touching many
drivers, and great way to regress stable software. 

> Matthew, Thomas, any hard NACKs on principle, or can we look into this
> in a future patch series?
> 

Definietly not NACK outright, but see my concerns above. If ends up not
being all that invasive and good cleanup, not opposed. If it starts
messing with existing reset / job cancel / queue teardown flows I would
be concerned given how hard it is to get though correct.

Matt

> 
> P.
> 
> 
> 
> 
> 
> >  
> >  But what we really need to get away from are those fence pre-
> > requisite violations Sima pointed out. For example we can't allocate
> > memory for run_job.
> >  
> >  Regards,
> >  Christian.
> >  
> >  
> > >  
> > > 
> > > 
> > > P.
> > > 
> > >  
> > > >  
> > > >  
> > > >  Regards,
> > > >  Christian.
> > > >  
> > > >  
> > > >  
> > > > >  
> > > > >  
> > > > > 
> > > > > 
> > > > > Grüße,
> > > > > P.
> > > > > 
> > > > >  
> > > > >  
> > > >  
> > > >  
> > > >  
> > >   
> >  
> >  
>