Am 13.03.25 um 10:30 schrieb Philipp Stanner: > The documentation for drm_sched_job_arm() and especially > drm_sched_job_cleanup() does not make it very clear why > drm_sched_job_arm() is a point of no return, which it indeed is. > > Make the nature of drm_sched_job_arm() in the docu as clear as possible. > > Suggested-by: Christian König <christian.koenig@xxxxxxx> > Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx> Reviewed-by: Christian König <christian.koenig@xxxxxxx> I'm currently looking into how to fix the amdgpu CS path for gang submission regarding this. Any objections that I add a preload function to allocate the memory for the XA outside of the critical section? Regards, Christian. > --- > drivers/gpu/drm/scheduler/sched_main.c | 24 ++++++++++++++++++------ > 1 file changed, 18 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 4d4219fbe49d..829579c41c6b 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -828,11 +828,15 @@ EXPORT_SYMBOL(drm_sched_job_init); > * > * This arms a scheduler job for execution. Specifically it initializes the > * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv > - * or other places that need to track the completion of this job. > + * or other places that need to track the completion of this job. It also > + * initializes sequence numbers, which are fundamental for fence ordering. > * > * Refer to drm_sched_entity_push_job() documentation for locking > * considerations. > * > + * Once this function was called, you *must* submit @job with > + * drm_sched_entity_push_job(). > + * > * This can only be called if drm_sched_job_init() succeeded. > */ > void drm_sched_job_arm(struct drm_sched_job *job) > @@ -1017,9 +1021,12 @@ EXPORT_SYMBOL(drm_sched_job_has_dependency); > * Drivers should call this from their error unwind code if @job is aborted > * before drm_sched_job_arm() is called. > * > - * After that point of no return @job is committed to be executed by the > - * scheduler, and this function should be called from the > - * &drm_sched_backend_ops.free_job callback. > + * drm_sched_job_arm() is a point of no return since it initializes the fences > + * and their sequence number etc. Once that function has been called, you *must* > + * submit it with drm_sched_entity_push_job() and cannot simply abort it by > + * calling drm_sched_job_cleanup(). > + * > + * This function should be called in the &drm_sched_backend_ops.free_job callback. > */ > void drm_sched_job_cleanup(struct drm_sched_job *job) > { > @@ -1027,10 +1034,15 @@ void drm_sched_job_cleanup(struct drm_sched_job *job) > unsigned long index; > > if (kref_read(&job->s_fence->finished.refcount)) { > - /* drm_sched_job_arm() has been called */ > + /* The job has been processed by the scheduler, i.e., > + * drm_sched_job_arm() and drm_sched_entity_push_job() have > + * been called. > + */ > dma_fence_put(&job->s_fence->finished); > } else { > - /* aborted job before committing to run it */ > + /* The job was aborted before it has been committed to be run; > + * notably, drm_sched_job_arm() has not been called. > + */ > drm_sched_fence_free(job->s_fence); > } >