On 2023-04-03 20:22, Matthew Brost wrote: > If the TDR is set to a value, it can fire before a job is submitted in > drm_sched_main. The job should be always be submitted before the TDR > fires, fix this ordering. > > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > --- > drivers/gpu/drm/scheduler/sched_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 6ae710017024..4eac02d212c1 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -1150,10 +1150,10 @@ static void drm_sched_main(struct work_struct *w) > s_fence = sched_job->s_fence; > > atomic_inc(&sched->hw_rq_count); > - drm_sched_job_begin(sched_job); > > trace_drm_run_job(sched_job, entity); > fence = sched->ops->run_job(sched_job); > + drm_sched_job_begin(sched_job); > complete_all(&entity->entity_idle); > drm_sched_fence_scheduled(s_fence); > Not sure if this is correct. In drm_sched_job_begin() we add the job to the "pending_list" (meaning it is pending execution in the hardware) and we also start a timeout timer. Both of those should be started before the job is given to the hardware. If the timeout is set to too small a value, then that should probably be fixed instead. Regards, Luben