Am 03.08.2018 um 19:31 schrieb Lucas Stach: > Am Montag, den 06.08.2018, 14:57 +0200 schrieb Christian König: >> Am 03.08.2018 um 16:29 schrieb Lucas Stach: >>> drm_sched_job_finish() is a work item scheduled for each finished job on >>> a unbound system workqueue. This means the workers can execute out of order >>> with regard to the real hardware job completions. >>> >>> If this happens queueing a timeout worker for the first job on the ring >>> mirror list is wrong, as this may be a job which has already finished >>> executing. Fix this by reorganizing the code to always queue the worker >>> for the next job on the list, if this job hasn't finished yet. This is >>> robust against a potential reordering of the finish workers. >>> >>> Also move out the timeout worker cancelling, so that we don't need to >>> take the job list lock twice. As a small optimization list_del is used >>> to remove the job from the ring mirror list, as there is no need to >>> reinit the list head in the job we are about to free. >>> >>>>> Signed-off-by: Lucas Stach <l.stach at pengutronix.de> >>> --- >>> v2: - properly handle last job in the ring >>>      - check correct fence for compeletion >>> --- >>>  drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 ++++++++++------------ >>>  1 file changed, 10 insertions(+), 12 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>> index 44d480768dfe..574875e2c206 100644 >>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >>> @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work) >>>>>      finish_work); >>>>>   struct drm_gpu_scheduler *sched = s_job->sched; >>> >>>>> - /* remove job from ring_mirror_list */ >>>>> - spin_lock(&sched->job_list_lock); >>>>> - list_del_init(&s_job->node); >>>>> - if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { >>>>> - struct drm_sched_job *next; >>> - >>>>> - spin_unlock(&sched->job_list_lock); >>>>> + if (sched->timeout != MAX_SCHEDULE_TIMEOUT) >>>   cancel_delayed_work_sync(&s_job->work_tdr); >> That is unfortunately still racy here. >> >> Between canceling the job and removing it from the list someone could >> actually start the time (in theory) :) >> >> Cancel it, remove it from the list and cancel it again. > I don't see how. If we end up in this worker the finished fence of the > job is already certainly signaled (as this is what triggers queueing of > the worker). So even if some other worker manages to find this job as > the next job in the list, the dma_fence_is_signaled check should > prevent the timeout worker from getting scheduled again. Well that makes sense, but is a bit hard to understand. Anyway, please remove the extra "if" check. With that done the patch is Reviewed-by: Christian König <christian.koenig at amd.com>. Thanks, Christian. > >> BTW: You could completely drop the "if (sched->timeout != >> MAX_SCHEDULE_TIMEOUT)" here cause canceling is harmless as long as the >> structure is initialized. > Right. > > Regards, > Lucas > >> Christian. >> >>>>> - spin_lock(&sched->job_list_lock); >>> >>>>> - /* queue TDR for next job */ >>>>> - next = list_first_entry_or_null(&sched->ring_mirror_list, >>>>> - struct drm_sched_job, node); >>>>> + spin_lock(&sched->job_list_lock); >>>>> + /* queue TDR for next job */ >>>>> + if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>>> +     !list_is_last(&s_job->node, &sched->ring_mirror_list)) { >>>>> + struct drm_sched_job *next = list_next_entry(s_job, node); >>> >>>>> - if (next) >>>>> + if (!dma_fence_is_signaled(&next->s_fence->finished)) >>>>>   schedule_delayed_work(&next->work_tdr, sched->timeout); >>>>>   } >>>>> + /* remove job from ring_mirror_list */ >>>>> + list_del(&s_job->node); >>>>>   spin_unlock(&sched->job_list_lock); >>> + >>>>>   dma_fence_put(&s_job->s_fence->finished); >>>>>   sched->ops->free_job(s_job); >>>  } >> > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx