+Matthew who's been working on the kthread -> wq transition. On Tue, 2 May 2023 13:36:07 +0200 Christian König <christian.koenig@xxxxxxx> wrote: > Hi Boris, > > Am 02.05.23 um 13:19 schrieb Boris Brezillon: > > Hello Christian, Alex, > > > > As part of our transition to drm_sched for the powervr GPU driver, we > > realized drm_sched_resubmit_jobs(), which is used by all drivers > > relying on drm_sched right except amdgpu, has been deprecated. > > Unfortunately, commit 5efbe6aa7a0e ("drm/scheduler: deprecate > > drm_sched_resubmit_jobs") doesn't describe what drivers should do or use > > as an alternative. > > > > At the very least, for our implementation, we need to restore the > > drm_sched_job::parent pointers that were set to NULL in > > drm_sched_stop(), such that jobs submitted before the GPU recovery are > > considered active when drm_sched_start() is called. That could be done > > with a custom pending_list iteration restoring drm_sched_job::parent's > > pointer, but that seems odd to let the scheduler backend manipulate > > this list directly, and I suspect we need to do other checks, like the > > karma vs hang-limit thing, so we can flag the entity dirty and cancel > > all jobs being queued there if the entity has caused too many hangs. > > > > Now that drm_sched_resubmit_jobs() has been deprecated, that would be > > great if you could help us write a piece of documentation describing > > what should be done between drm_sched_stop() and drm_sched_start(), so > > new drivers don't come up with their own slightly different/broken > > version of the same thing. > > Yeah, really good point! The solution is to not use drm_sched_stop() and > drm_sched_start() either. > > The general idea Daniel, the other Intel guys and me seem to have agreed > on is to convert the scheduler thread into a work item. > > This work item for pushing jobs to the hw can then be queued to the same > workqueue we use for the timeout work item. > > If this workqueue is now configured by your driver as single threaded > you have a guarantee that only either the scheduler or the timeout work > item is running at the same time. That in turn makes starting/stopping > the scheduler for a reset completely superfluous. > > Patches for this has already been floating on the mailing list, but > haven't been committed yet. Since this is all WIP. As I said previously, our work is based on Matthew's patch series converting drm_sched threads to a workqueue based implementation. And I think there are a couple of shortcomings with the current implementation if we do what you suggest (one single-threaded workqueue used to serialize the GPU resets and job dequeuing/submission): - drm_sched_main() has a loop dequeuing jobs until it can't (because a dep is not signaled, or because all slots are taken), not one at a time. There can also be several 1:1 entities waiting for jobs to be dequeued before the reset work. That means the reset operation might be delayed if we don't have a drm_sched_stop() calling cancel_work() and making sure we break out of the loop. - with a workqueue (and that's even worse with a single-threaded one), the entity priorities are not enforced at the de-queuing level. The FW will still take care of execution priority, but you can have a low prio drm_sched_entity getting pushed a lot of jobs, and with the drm_sched_main() loop, you'll only let other entities dequeue their jobs when the ringbuf of this low prio entity is full or a job dep is not signaled. De-queuing one job at a time makes things better, but if you have a lot of 1:1 entities, it can still take sometime until you reach the high prio drm_sched worker. I'm just wondering if you already have solutions to these problems. Regards, Boris