On Mon, 2025-03-03 at 17:13 +0000, Tvrtko Ursulin wrote: > > On 03/03/2025 14:32, Philipp Stanner wrote: > > drm_sched_job_cleanup()'s documentation claims that calling > > drm_sched_job_arm() is a "point of no return", implying that > > afterwards > > a job cannot be cancelled anymore. > > > > This is not correct, as proven by the function's code itself, which > > takes a previous call to drm_sched_job_arm() into account. In > > truth, the > > decisive factors are whether fences have been shared (e.g., with > > other > > processes) and if the job has been submitted to an entity already. > > > > Correct the wrong docstring. > > > > Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx> > > --- > > drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++---- > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > b/drivers/gpu/drm/scheduler/sched_main.c > > index c634993f1346..db9e08e6e455 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -1013,11 +1013,13 @@ > > EXPORT_SYMBOL(drm_sched_job_has_dependency); > > * Cleans up the resources allocated with drm_sched_job_init(). > > * > > * Drivers should call this from their error unwind code if @job > > is aborted > > - * before drm_sched_job_arm() is called. > > + * before it was submitted to an entity with > > drm_sched_entity_push_job(). > > * > > - * After that point of no return @job is committed to be executed > > by the > > - * scheduler, and this function should be called from the > > - * &drm_sched_backend_ops.free_job callback. > > + * Since calling drm_sched_job_arm() causes the job's fences to be > > initialized, > > + * it is up to the driver to ensure that fences that were exposed > > to external > > + * parties get signaled. drm_sched_job_cleanup() does not ensure > > this. > > + * > > + * This function must also be called in &struct > > drm_sched_backend_ops.free_job > > */ > > void drm_sched_job_cleanup(struct drm_sched_job *job) > > { > > I also do not see anything so special in the flows which would > disallow > "aborting" jobs after arming. And it definitely can happen in the > practice. It was probably just unclear kerneldoc. I thought about why they wrote that for a few months and the only reason I can imagine was that they were worried about the initialized fences being shared. > > What I would suggest to add to the patch is to change the comment in > drm_sched_job_cleanup() implementation: > > This one: > > /* aborted job before committing to run it */ > > Probably just change to: > > /* aborted before arming */ Yes, good catch. Will do. P. > > With that you can add my r-b. > > Regards, > > Tvrtko >