Hi, On 2023-09-19 01:01, Matthew Brost wrote: > If the TDR is set to a very small value it can fire before the > submission is started in the function drm_sched_start. The submission is > expected to running when the TDR fires, fix this ordering so this > expectation is always met. > > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > --- > drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 09ef07b9e9d5..a5cc9b6c2faa 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -684,10 +684,10 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) > drm_sched_job_done(s_job, -ECANCELED); > } > > + drm_sched_submit_start(sched); > + > if (full_recovery) > drm_sched_start_timeout_unlocked(sched); > - > - drm_sched_submit_start(sched); > } > EXPORT_SYMBOL(drm_sched_start); No. A timeout timer should be started before we submit anything down to the hardware. See Message-ID: <ed3aca10-8a9f-4698-92f4-21558fa6cfe3@xxxxxxx>, and Message-ID: <8e5eab14-9e55-42c9-b6ea-02fcc591266d@xxxxxxx>. You shouldn't start TDR at an arbitrarily late time after job submission to the hardware. To close this, the timer is started before jobs are submitted to the hardware. One possibility is to increase the timeout timer value. -- Regards, Luben