On Mon, 2025-03-10 at 13:25 +0100, Christian König wrote: > Am 10.03.25 um 13:11 schrieb Philipp Stanner: > > On Mon, 2025-03-10 at 08:44 +0100, Christian König wrote: > > > This reverts commit 44d2f310f008613c1dbe5e234c2cf2be90cbbfab. > > OK, your arguments with fence ordering are strong. Please update > > the > > commit message according to our discussion: > > > > > Sorry for the delayed response, I only stumbled over this now > > > while > > > going > > > over old mails and then re-thinking my reviewed by for this > > > change. > > Your RB hadn't even been applied (I merged before you gave it), so > > you > > can remove this first paragraph from the commit message > > > > > The function drm_sched_job_arm() is indeed the point of no > > > return. > > > The > > > background is that it is nearly impossible for the driver to > > > correctly > > > retract the fence and signal it in the order enforced by the > > > dma_fence > > > framework. > > > > > > The code in drm_sched_job_cleanup() is for the purpose to cleanup > > > after > > > the job was armed through drm_sched_job_arm() *and* processed by > > > the > > > scheduler. > > > > > > The correct approach for error handling in this situation is to > > > set > > > the > > > error on the fences and then push to the entity anyway. We can > > > certainly > > > improve the documentation, but removing the warning is clearly > > > not a > > > good > > > idea. > > This last paragraph, as per our discussion, seems invalid. We > > shouldn't > > have that in the commit log, so that it won't give later hackers > > browsing it wrong ideas and we end up with someone actually > > mengling > > with those fences. > > Sure, going to make those updates. I just wanted to give people a > possible direction to look into when they really run into a situation > where they need to abort some submission very late. > > Should I also clarify the comment in drm_sched_job_cleanup()? Shouldn't be done in a revert-commit I think. I'd say we revert for now (it's not a big loss, just a bit of docu) and then you (or maybe I) submit a new patch where we discuss Tvrtko's questions and then write down the rules and issues precisely in both commit message and the function's comments. P. > > Regards, > Christian. > > > > > Thx > > P. > > > > > Signed-off-by: Christian König <christian.koenig@xxxxxxx> > > > --- > > > drivers/gpu/drm/scheduler/sched_main.c | 12 +++++------- > > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > index 53e6aec37b46..4d4219fbe49d 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > @@ -1015,13 +1015,11 @@ > > > EXPORT_SYMBOL(drm_sched_job_has_dependency); > > > * Cleans up the resources allocated with drm_sched_job_init(). > > > * > > > * Drivers should call this from their error unwind code if @job > > > is > > > aborted > > > - * before it was submitted to an entity with > > > drm_sched_entity_push_job(). > > > + * before drm_sched_job_arm() is called. > > > * > > > - * Since calling drm_sched_job_arm() causes the job's fences to > > > be > > > initialized, > > > - * it is up to the driver to ensure that fences that were > > > exposed to > > > external > > > - * parties get signaled. drm_sched_job_cleanup() does not ensure > > > this. > > > - * > > > - * This function must also be called in &struct > > > drm_sched_backend_ops.free_job > > > + * After that point of no return @job is committed to be > > > executed by > > > the > > > + * scheduler, and this function should be called from the > > > + * &drm_sched_backend_ops.free_job callback. > > > */ > > > void drm_sched_job_cleanup(struct drm_sched_job *job) > > > { > > > @@ -1032,7 +1030,7 @@ void drm_sched_job_cleanup(struct > > > drm_sched_job > > > *job) > > > /* drm_sched_job_arm() has been called */ > > > dma_fence_put(&job->s_fence->finished); > > > } else { > > > - /* aborted job before arming */ > > > + /* aborted job before committing to run it */ > > > drm_sched_fence_free(job->s_fence); > > > } > > > >