On Thu, Jul 25, 2024 at 09:42:08AM +0200, Christian König wrote: > Am 25.07.24 um 01:44 schrieb Matthew Brost: > > Only start in drm_sched_job_begin on first job being added to the > > pending list as if pending list non-empty the TDR has already been > > started. It is problematic to restart the TDR as it will extend TDR > > period for an already running job, potentially leading to dma-fence > > signaling for a very long period of with continous stream of jobs. > > Mhm, that should be unnecessary. drm_sched_start_timeout() should only start > the timeout, but never re-start it. > That function checks the pending list for not empty, so it indeed starts it. Which is the correct behavior for some of the callers, e.g. drm_sched_tdr_queue_imm, drm_sched_get_finished_job IMO best to fix this here. Also FWIW on Xe I wrote a test which submitted a new ending spinner, then submitted a job every second on the same queue in a loop and observed the spinner not get canceled for a long time. After this patch, the spinner correctly timed out after 5 second (our default TDR period). Matt > Could be that this isn't working properly. > > Regards, > Christian. > > > > > Cc: Christian König <christian.koenig@xxxxxxx> > > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > > --- > > drivers/gpu/drm/scheduler/sched_main.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > > index 7e90c9f95611..feeeb9dbeb86 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -540,7 +540,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job) > > spin_lock(&sched->job_list_lock); > > list_add_tail(&s_job->list, &sched->pending_list); > > - drm_sched_start_timeout(sched); > > + if (list_is_singular(&sched->pending_list)) > > + drm_sched_start_timeout(sched); > > spin_unlock(&sched->job_list_lock); > > } >