Re: [RFC PATCH 06/10] drm/sched: Submit job before starting TDR

Matthew Brost <matthew.brost@xxxxxxxxx> · Mon, 31 Jul 2023 01:00:59 +0000

On Thu, May 04, 2023 at 01:23:05AM -0400, Luben Tuikov wrote:
> On 2023-04-03 20:22, Matthew Brost wrote:
> > If the TDR is set to a value, it can fire before a job is submitted in
> > drm_sched_main. The job should be always be submitted before the TDR
> > fires, fix this ordering.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 6ae710017024..4eac02d212c1 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1150,10 +1150,10 @@ static void drm_sched_main(struct work_struct *w)
> >  		s_fence = sched_job->s_fence;
> >  
> >  		atomic_inc(&sched->hw_rq_count);
> > -		drm_sched_job_begin(sched_job);
> >  
> >  		trace_drm_run_job(sched_job, entity);
> >  		fence = sched->ops->run_job(sched_job);
> > +		drm_sched_job_begin(sched_job);
> >  		complete_all(&entity->entity_idle);
> >  		drm_sched_fence_scheduled(s_fence);
> >  
> 
> Not sure if this is correct. In drm_sched_job_begin() we add the job to the "pending_list"
> (meaning it is pending execution in the hardware) and we also start a timeout timer. Both
> of those should be started before the job is given to the hardware.
> 

The correct solution is probably add to pending list before run_job()
and kick TDR after run_job().

> If the timeout is set to too small a value, then that should probably be fixed instead.
>

Disagree, a user should be able to set TDR value to anything it wants
and not break the DRM scheduler.

Matt

> Regards,
> Luben