Hi Lucas, thanks a lot for taking care of that, but there is one thing you have missed: It is perfectly possible that the job is the last one on the list and list_next_entry() doesn't test for that, e.g. it never return NULL. BTW: There are also quite a lot of other things we could optimize here. So if you need something todo, just ping me :) Cheers, Christian. Am 03.08.2018 um 15:18 schrieb Lucas Stach: > drm_sched_job_finish() is a work item scheduled for each finished job on > a unbound system workqueue. This means the workers can execute out of order > with regard to the real hardware job completions. > > If this happens queueing a timeout worker for the first job on the ring > mirror list is wrong, as this may be a job which has already finished > executing. Fix this by reorganizing the code to always queue the worker > for the next job on the list, if this job hasn't finished yet. This is > robust against a potential reordering of the finish workers. > > Also move out the timeout worker cancelling, so that we don't need to > take the job list lock twice. As a small optimization list_del is used > to remove the job from the ring mirror list, as there is no need to > reinit the list head in the job we are about to free. > > Signed-off-by: Lucas Stach <l.stach at pengutronix.de> > --- > drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +++++++++----------- > 1 file changed, 9 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c > index 44d480768dfe..0be2859d7b80 100644 > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c > @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work) > finish_work); > struct drm_gpu_scheduler *sched = s_job->sched; > > - /* remove job from ring_mirror_list */ > + if (sched->timeout != MAX_SCHEDULE_TIMEOUT) > + cancel_delayed_work_sync(&s_job->work_tdr); > + > spin_lock(&sched->job_list_lock); > - list_del_init(&s_job->node); > if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { > - struct drm_sched_job *next; > - > - spin_unlock(&sched->job_list_lock); > - cancel_delayed_work_sync(&s_job->work_tdr); > - spin_lock(&sched->job_list_lock); > + struct drm_sched_job *next = list_next_entry(s_job, node); > > /* queue TDR for next job */ > - next = list_first_entry_or_null(&sched->ring_mirror_list, > - struct drm_sched_job, node); > - > - if (next) > + if (next != s_job && > + !dma_fence_is_signaled(&s_job->s_fence->finished)) > schedule_delayed_work(&next->work_tdr, sched->timeout); > } > + /* remove job from ring_mirror_list */ > + list_del(&s_job->node); > spin_unlock(&sched->job_list_lock); > + > dma_fence_put(&s_job->s_fence->finished); > sched->ops->free_job(s_job); > }