Hi Philipp,
On 20/02/25 08:28, Philipp Stanner wrote:
The documentation for drm_sched_backend_ops.run_job() mentions a certain
function called drm_sched_job_recovery(). This function does not exist.
What's actually meant is drm_sched_resubmit_jobs(), which is by now also
deprecated.
Remove the mention of the removed function.
Discourage the behavior of drm_sched_backend_ops.run_job() being called
multiple times for the same job.
It looks odd to me that this patch removes lines that were added in
patch 1/3. Maybe you could change the patchset order and place this one
as the first.
Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
---
include/drm/gpu_scheduler.h | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 916279b5aa00..29e5bda91806 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -421,20 +421,27 @@ struct drm_sched_backend_ops {
/**
* @run_job: Called to execute the job once all of the dependencies
- * have been resolved. This may be called multiple times, if
- * timedout_job() has happened and drm_sched_job_recovery() decides to
- * try it again.
+ * have been resolved.
+ *
+ * The deprecated drm_sched_resubmit_jobs() (called from
+ * drm_sched_backend_ops.timedout_job()) can invoke this again with the
I think it would be "@timedout_job".
+ * same parameters. Using this is discouraged because it, presumably,
+ * violates dma_fence rules.
I believe it would be "struct dma_fence".
+ *
+ * TODO: Document which fence rules above.
*
* @sched_job: the job to run
*
- * Returns: dma_fence the driver must signal once the hardware has
- * completed the job ("hardware fence").
- *
* Note that the scheduler expects to 'inherit' its own reference to
* this fence from the callback. It does not invoke an extra
* dma_fence_get() on it. Consequently, this callback must take a
* reference for the scheduler, and additional ones for the driver's
* respective needs.
Would it be possible to add a comment that `run_job()` must check if
`s_fence->finished.error` is different than 0? If you increase the karma
of a job and don't check for `s_fence->finished.error`, you might run a
cancelled job.
+ *
+ * Return:
+ * * On success: dma_fence the driver must signal once the hardware has
+ * completed the job ("hardware fence").
A suggestion: "the fence that the driver must signal once the hardware
has completed the job".
Best Regards,
- Maíra
+ * * On failure: NULL or an ERR_PTR.
*/
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);