Re: drm/sched: Replacement for drm_sched_resubmit_jobs() is deprecated

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Tue, 2 May 2023 18:50:06 +0200

+Matthew who's been working on the kthread -> wq transition.

On Tue, 2 May 2023 13:36:07 +0200
Christian König <christian.koenig@xxxxxxx> wrote:

> Hi Boris,
> 
> Am 02.05.23 um 13:19 schrieb Boris Brezillon:
> > Hello Christian, Alex,
> >
> > As part of our transition to drm_sched for the powervr GPU driver, we
> > realized drm_sched_resubmit_jobs(), which is used by all drivers
> > relying on drm_sched right except amdgpu, has been deprecated.
> > Unfortunately, commit 5efbe6aa7a0e ("drm/scheduler: deprecate
> > drm_sched_resubmit_jobs") doesn't describe what drivers should do or use
> > as an alternative.
> >
> > At the very least, for our implementation, we need to restore the
> > drm_sched_job::parent pointers that were set to NULL in
> > drm_sched_stop(), such that jobs submitted before the GPU recovery are
> > considered active when drm_sched_start() is called. That could be done
> > with a custom pending_list iteration restoring drm_sched_job::parent's
> > pointer, but that seems odd to let the scheduler backend manipulate
> > this list directly, and I suspect we need to do other checks, like the
> > karma vs hang-limit thing, so we can flag the entity dirty and cancel
> > all jobs being queued there if the entity has caused too many hangs.
> >
> > Now that drm_sched_resubmit_jobs() has been deprecated, that would be
> > great if you could help us write a piece of documentation describing
> > what should be done between drm_sched_stop() and drm_sched_start(), so
> > new drivers don't come up with their own slightly different/broken
> > version of the same thing.  
> 
> Yeah, really good point! The solution is to not use drm_sched_stop() and 
> drm_sched_start() either.
> 
> The general idea Daniel, the other Intel guys and me seem to have agreed 
> on is to convert the scheduler thread into a work item.
> 
> This work item for pushing jobs to the hw can then be queued to the same 
> workqueue we use for the timeout work item.
> 
> If this workqueue is now configured by your driver as single threaded 
> you have a guarantee that only either the scheduler or the timeout work 
> item is running at the same time. That in turn makes starting/stopping 
> the scheduler for a reset completely superfluous.
> 
> Patches for this has already been floating on the mailing list, but 
> haven't been committed yet. Since this is all WIP.

As I said previously, our work is based on Matthew's patch series
converting drm_sched threads to a workqueue based implementation.
And I think there are a couple of shortcomings with the current
implementation if we do what you suggest (one single-threaded workqueue
used to serialize the GPU resets and job dequeuing/submission):

- drm_sched_main() has a loop dequeuing jobs until it can't (because a
  dep is not signaled, or because all slots are taken), not one at a
  time. There can also be several 1:1 entities waiting for jobs to be
  dequeued before the reset work. That means the reset operation might
  be delayed if we don't have a drm_sched_stop() calling cancel_work()
  and making sure we break out of the loop.
- with a workqueue (and that's even worse with a single-threaded one),
  the entity priorities are not enforced at the de-queuing level. The FW
  will still take care of execution priority, but you can have a low
  prio drm_sched_entity getting pushed a lot of jobs, and with the
  drm_sched_main() loop, you'll only let other entities dequeue their
  jobs when the ringbuf of this low prio entity is full or a job dep
  is not signaled. De-queuing one job at a time makes things better, but
  if you have a lot of 1:1 entities, it can still take sometime until
  you reach the high prio drm_sched worker.

I'm just wondering if you already have solutions to these problems.

Regards,

Boris