On Wed, Mar 05, 2025 at 02:05:52PM +0100, Philipp Stanner wrote: > drm_sched_backend_ops.timedout_job()'s documentation is outdated. It > mentions the deprecated function drm_sched_resubmit_jobs(). Furthermore, > it does not point out the important distinction between hardware and > firmware schedulers. > > Since firmware schedulers typically only use one entity per scheduler, > timeout handling is significantly more simple because the entity the > faulted job came from can just be killed without affecting innocent > processes. > > Update the documentation with that distinction and other details. > > Reformat the docstring to work to a unified style with the other > handles. > Looks really good, one suggestion. > Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx> > --- > include/drm/gpu_scheduler.h | 78 ++++++++++++++++++++++--------------- > 1 file changed, 47 insertions(+), 31 deletions(-) > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 6381baae8024..1a7e377d4cbb 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -383,8 +383,15 @@ struct drm_sched_job { > struct xarray dependencies; > }; > > +/** > + * enum drm_gpu_sched_stat - the scheduler's status > + * > + * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use. > + * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded. > + * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available anymore. > + */ > enum drm_gpu_sched_stat { > - DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */ > + DRM_GPU_SCHED_STAT_NONE, > DRM_GPU_SCHED_STAT_NOMINAL, > DRM_GPU_SCHED_STAT_ENODEV, > }; > @@ -447,43 +454,52 @@ struct drm_sched_backend_ops { > * @timedout_job: Called when a job has taken too long to execute, > * to trigger GPU recovery. > * > - * This method is called in a workqueue context. > + * @sched_job: The job that has timed out > * > - * Drivers typically issue a reset to recover from GPU hangs, and this > - * procedure usually follows the following workflow: > + * Drivers typically issue a reset to recover from GPU hangs. > + * This procedure looks very different depending on whether a firmware > + * or a hardware scheduler is being used. > * > - * 1. Stop the scheduler using drm_sched_stop(). This will park the > - * scheduler thread and cancel the timeout work, guaranteeing that > - * nothing is queued while we reset the hardware queue > - * 2. Try to gracefully stop non-faulty jobs (optional) > - * 3. Issue a GPU reset (driver-specific) > - * 4. Re-submit jobs using drm_sched_resubmit_jobs() > - * 5. Restart the scheduler using drm_sched_start(). At that point, new > - * jobs can be queued, and the scheduler thread is unblocked > + * For a FIRMWARE SCHEDULER, each ring has one scheduler, and each > + * scheduler has one entity. Hence, the steps taken typically look as > + * follows: > + * > + * 1. Stop the scheduler using drm_sched_stop(). This will pause the > + * scheduler workqueues and cancel the timeout work, guaranteeing > + * that nothing is queued while the ring is being removed. > + * 2. Remove the ring. The firmware will make sure that the > + * corresponding parts of the hardware are resetted, and that other > + * rings are not impacted. > + * 3. Kill the entity and the associated scheduler. Xe doesn't do step 3. It does: - Ban entity / scheduler so futures submissions are a NOP. This would be submissions with unmet dependencies. Submission at the IOCTL are disallowed - Signal all job's fences on the pending list - Restart scheduler so free_job() is naturally called I'm unsure if this how other firmware schedulers do this, but it seems to work quite well in Xe. Matt > + * > + * > + * For a HARDWARE SCHEDULER, a scheduler instance schedules jobs from > + * one or more entities to one ring. This implies that all entities > + * associated with the affected scheduler cannot be torn down, because > + * this would effectively also affect innocent userspace processes which > + * did not submit faulty jobs (for example). > + * > + * Consequently, the procedure to recover with a hardware scheduler > + * should look like this: > + * > + * 1. Stop all schedulers impacted by the reset using drm_sched_stop(). > + * 2. Kill the entity the faulty job stems from. > + * 3. Issue a GPU reset on all faulty rings (driver-specific). > + * 4. Re-submit jobs on all schedulers impacted by re-submitting them to > + * the entities which are still alive. > + * 5. Restart all schedulers that were stopped in step #1 using > + * drm_sched_start(). > * > * Note that some GPUs have distinct hardware queues but need to reset > * the GPU globally, which requires extra synchronization between the > - * timeout handler of the different &drm_gpu_scheduler. One way to > - * achieve this synchronization is to create an ordered workqueue > - * (using alloc_ordered_workqueue()) at the driver level, and pass this > - * queue to drm_sched_init(), to guarantee that timeout handlers are > - * executed sequentially. The above workflow needs to be slightly > - * adjusted in that case: > + * timeout handlers of different schedulers. One way to achieve this > + * synchronization is to create an ordered workqueue (using > + * alloc_ordered_workqueue()) at the driver level, and pass this queue > + * as drm_sched_init()'s @timeout_wq parameter. This will guarantee > + * that timeout handlers are executed sequentially. > * > - * 1. Stop all schedulers impacted by the reset using drm_sched_stop() > - * 2. Try to gracefully stop non-faulty jobs on all queues impacted by > - * the reset (optional) > - * 3. Issue a GPU reset on all faulty queues (driver-specific) > - * 4. Re-submit jobs on all schedulers impacted by the reset using > - * drm_sched_resubmit_jobs() > - * 5. Restart all schedulers that were stopped in step #1 using > - * drm_sched_start() > + * Return: The scheduler's status, defined by &enum drm_gpu_sched_stat > * > - * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal, > - * and the underlying driver has started or completed recovery. > - * > - * Return DRM_GPU_SCHED_STAT_ENODEV, if the device is no longer > - * available, i.e. has been unplugged. > */ > enum drm_gpu_sched_stat (*timedout_job)(struct drm_sched_job *sched_job); > > -- > 2.48.1 >