Re: drm/sched: Replacement for drm_sched_resubmit_jobs() is deprecated

Christian König <christian.koenig@xxxxxxx> · Tue, 2 May 2023 13:36:07 +0200

Hi Boris,

Am 02.05.23 um 13:19 schrieb Boris Brezillon:
Hello Christian, Alex,

As part of our transition to drm_sched for the powervr GPU driver, we
realized drm_sched_resubmit_jobs(), which is used by all drivers
relying on drm_sched right except amdgpu, has been deprecated.
Unfortunately, commit 5efbe6aa7a0e ("drm/scheduler: deprecate
drm_sched_resubmit_jobs") doesn't describe what drivers should do or use
as an alternative.

At the very least, for our implementation, we need to restore the
drm_sched_job::parent pointers that were set to NULL in
drm_sched_stop(), such that jobs submitted before the GPU recovery are
considered active when drm_sched_start() is called. That could be done
with a custom pending_list iteration restoring drm_sched_job::parent's
pointer, but that seems odd to let the scheduler backend manipulate
this list directly, and I suspect we need to do other checks, like the
karma vs hang-limit thing, so we can flag the entity dirty and cancel
all jobs being queued there if the entity has caused too many hangs.

Now that drm_sched_resubmit_jobs() has been deprecated, that would be
great if you could help us write a piece of documentation describing
what should be done between drm_sched_stop() and drm_sched_start(), so
new drivers don't come up with their own slightly different/broken
version of the same thing.

Yeah, really good point! The solution is to not use drm_sched_stop() and 
drm_sched_start() either.

The general idea Daniel, the other Intel guys and me seem to have agreed 
on is to convert the scheduler thread into a work item.

This work item for pushing jobs to the hw can then be queued to the same 
workqueue we use for the timeout work item.

If this workqueue is now configured by your driver as single threaded 
you have a guarantee that only either the scheduler or the timeout work 
item is running at the same time. That in turn makes starting/stopping 
the scheduler for a reset completely superfluous.

Patches for this has already been floating on the mailing list, but 
haven't been committed yet. Since this is all WIP.

In general it's not really a good idea to change the scheduler and hw 
fences during GPU reset/recovery. The dma_fence implementation has a 
pretty strict state transition which clearly say that a dma_fence should 
never go back from signaled to unsignaled and when you start messing 
with that this is exactly what might happen.

What you can do is to save your hw state and re-start at the same 
location after handling the timeout.

Regards,
Christian.

Thanks in advance for your help.

Regards,

Boris