Re: [PATCH v2] drm/scheduler: fix timeout worker setup for out of order job completions

Lucas Stach <l.stach@xxxxxxxxxxxxxx> · Mon, 06 Aug 2018 15:18:19 +0200

Am Montag, den 06.08.2018, 10:12 +0200 schrieb Christian König:
> Am 03.08.2018 um 19:31 schrieb Lucas Stach:
> > Am Montag, den 06.08.2018, 14:57 +0200 schrieb Christian König:
> > > Am 03.08.2018 um 16:29 schrieb Lucas Stach:
> > > > drm_sched_job_finish() is a work item scheduled for each finished job on
> > > > a unbound system workqueue. This means the workers can execute out of order
> > > > with regard to the real hardware job completions.
> > > > 
> > > > If this happens queueing a timeout worker for the first job on the ring
> > > > mirror list is wrong, as this may be a job which has already finished
> > > > executing. Fix this by reorganizing the code to always queue the worker
> > > > for the next job on the list, if this job hasn't finished yet. This is
> > > > robust against a potential reordering of the finish workers.
> > > > 
> > > > Also move out the timeout worker cancelling, so that we don't need to
> > > > take the job list lock twice. As a small optimization list_del is used
> > > > to remove the job from the ring mirror list, as there is no need to
> > > > reinit the list head in the job we are about to free.
> > > > 
> > > > > > Signed-off-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx>
> > > > 
> > > > ---
> > > > v2: - properly handle last job in the ring
> > > >       - check correct fence for compeletion
> > > > ---
> > > >    drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 ++++++++++------------
> > > >    1 file changed, 10 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > index 44d480768dfe..574875e2c206 100644
> > > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > > > @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work)
> > > > > > > > > > > >    						   finish_work);
> > > > > >    	struct drm_gpu_scheduler *sched = s_job->sched;
> > > > 
> > > >    
> > > > > > > > > > > > -	/* remove job from ring_mirror_list */
> > > > > > > > > > > > -	spin_lock(&sched->job_list_lock);
> > > > > > > > > > > > -	list_del_init(&s_job->node);
> > > > > > > > > > > > -	if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
> > > > > > -		struct drm_sched_job *next;
> > > > 
> > > > -
> > > > > > > > > > > > -		spin_unlock(&sched->job_list_lock);
> > > > > > +	if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
> > > > 
> > > >    		cancel_delayed_work_sync(&s_job->work_tdr);
> > > 
> > > That is unfortunately still racy here.
> > > 
> > > Between canceling the job and removing it from the list someone could
> > > actually start the time (in theory) :)
> > > 
> > > Cancel it, remove it from the list and cancel it again.
> > 
> > I don't see how. If we end up in this worker the finished fence of the
> > job is already certainly signaled (as this is what triggers queueing of
> > the worker). So even if some other worker manages to find this job as
> > the next job in the list, the dma_fence_is_signaled check should
> > prevent the timeout worker from getting scheduled again.
> 
> Well that makes sense, but is a bit hard to understand.

I agree, all the possible parallelism and possible re-ordering makes
this seemingly simple part of the scheduler code a bit mind-breaking.
I've added a comment in v3 to capture the line of thought for future
reference.

Regards,
Lucas

> Anyway, please remove the extra "if" check. With that done the patch is 
> > Reviewed-by: Christian König <christian.koenig@xxxxxxx>.
> 
> Thanks,
> Christian.
> 
> 
> > 
> > > BTW: You could completely drop the "if (sched->timeout !=
> > > MAX_SCHEDULE_TIMEOUT)" here cause canceling is harmless as long as the
> > > structure is initialized.
> > 
> > Right.
> > 
> > Regards,
> > Lucas
> > 
> > > Christian.
> > > 
> > > > > > -		spin_lock(&sched->job_list_lock);
> > > > 
> > > >    
> > > > > > > > > > > > -		/* queue TDR for next job */
> > > > > > > > > > > > -		next = list_first_entry_or_null(&sched->ring_mirror_list,
> > > > > > > > > > > > -						struct drm_sched_job, node);
> > > > > > > > > > > > +	spin_lock(&sched->job_list_lock);
> > > > > > > > > > > > +	/* queue TDR for next job */
> > > > > > > > > > > > +	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> > > > > > > > > > > > +	    !list_is_last(&s_job->node, &sched->ring_mirror_list)) {
> > > > > > +		struct drm_sched_job *next = list_next_entry(s_job, node);
> > > > 
> > > >    
> > > > > > > > > > > > -		if (next)
> > > > > > > > > > > > +		if (!dma_fence_is_signaled(&next->s_fence->finished))
> > > > > > > > > > > >    			schedule_delayed_work(&next->work_tdr, sched->timeout);
> > > > > > > > > > > >    	}
> > > > > > > > > > > > +	/* remove job from ring_mirror_list */
> > > > > > > > > > > > +	list_del(&s_job->node);
> > > > > >    	spin_unlock(&sched->job_list_lock);
> > > > 
> > > > +
> > > > > > > > > > > >    	dma_fence_put(&s_job->s_fence->finished);
> > > > > >    	sched->ops->free_job(s_job);
> > > > 
> > > >    }
> > 
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
> 
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel