Am 13.03.25 um 11:26 schrieb Philipp Stanner: > On Thu, 2025-03-13 at 11:07 +0100, Christian König wrote: >> Am 13.03.25 um 10:30 schrieb Philipp Stanner: >>> The documentation for drm_sched_job_arm() and especially >>> drm_sched_job_cleanup() does not make it very clear why >>> drm_sched_job_arm() is a point of no return, which it indeed is. >>> >>> Make the nature of drm_sched_job_arm() in the docu as clear as >>> possible. >>> >>> Suggested-by: Christian König <christian.koenig@xxxxxxx> >>> Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx> >> Reviewed-by: Christian König <christian.koenig@xxxxxxx> > Thx > >> I'm currently looking into how to fix the amdgpu CS path for gang >> submission regarding this. >> >> Any objections that I add a preload function to allocate the memory >> for the XA outside of the critical section? > I can't fully follow, you mean when creating the dependencies xarray? > > I think in a perfect world we wouldn't have any more functions that > only have 1 user to work around driver-internal problems. That said, I > realize that the world isn't perfect. Would still be cool if every > function we add would be useful to >1 driver. > > I think it's probably best if you just propose sth in form of a patch > and we discuss it there Ok I will give that a try. It's tricky at best and I'm just now throwing away the second try to solve it. Let's see if approach number three works. Thanks, Christian. > > Grüße > P. > > >> Regards, >> Christian. >> >>> --- >>> drivers/gpu/drm/scheduler/sched_main.c | 24 ++++++++++++++++++---- >>> -- >>> 1 file changed, 18 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index 4d4219fbe49d..829579c41c6b 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -828,11 +828,15 @@ EXPORT_SYMBOL(drm_sched_job_init); >>> * >>> * This arms a scheduler job for execution. Specifically it >>> initializes the >>> * &drm_sched_job.s_fence of @job, so that it can be attached to >>> struct dma_resv >>> - * or other places that need to track the completion of this job. >>> + * or other places that need to track the completion of this job. >>> It also >>> + * initializes sequence numbers, which are fundamental for fence >>> ordering. >>> * >>> * Refer to drm_sched_entity_push_job() documentation for locking >>> * considerations. >>> * >>> + * Once this function was called, you *must* submit @job with >>> + * drm_sched_entity_push_job(). >>> + * >>> * This can only be called if drm_sched_job_init() succeeded. >>> */ >>> void drm_sched_job_arm(struct drm_sched_job *job) >>> @@ -1017,9 +1021,12 @@ EXPORT_SYMBOL(drm_sched_job_has_dependency); >>> * Drivers should call this from their error unwind code if @job >>> is aborted >>> * before drm_sched_job_arm() is called. >>> * >>> - * After that point of no return @job is committed to be executed >>> by the >>> - * scheduler, and this function should be called from the >>> - * &drm_sched_backend_ops.free_job callback. >>> + * drm_sched_job_arm() is a point of no return since it >>> initializes the fences >>> + * and their sequence number etc. Once that function has been >>> called, you *must* >>> + * submit it with drm_sched_entity_push_job() and cannot simply >>> abort it by >>> + * calling drm_sched_job_cleanup(). >>> + * >>> + * This function should be called in the >>> &drm_sched_backend_ops.free_job callback. >>> */ >>> void drm_sched_job_cleanup(struct drm_sched_job *job) >>> { >>> @@ -1027,10 +1034,15 @@ void drm_sched_job_cleanup(struct >>> drm_sched_job *job) >>> unsigned long index; >>> >>> if (kref_read(&job->s_fence->finished.refcount)) { >>> - /* drm_sched_job_arm() has been called */ >>> + /* The job has been processed by the scheduler, >>> i.e., >>> + * drm_sched_job_arm() and >>> drm_sched_entity_push_job() have >>> + * been called. >>> + */ >>> dma_fence_put(&job->s_fence->finished); >>> } else { >>> - /* aborted job before committing to run it */ >>> + /* The job was aborted before it has been >>> committed to be run; >>> + * notably, drm_sched_job_arm() has not been >>> called. >>> + */ >>> drm_sched_fence_free(job->s_fence); >>> } >>>