Re: [PATCH v5 7/7] drm/sched: Add helper to queue TDR immediately for current and future jobs

Matthew Brost <matthew.brost@xxxxxxxxx> · Mon, 16 Oct 2023 15:03:37 +0000

On Fri, Oct 13, 2023 at 11:04:47PM -0400, Luben Tuikov wrote:
> On 2023-10-11 19:58, Matthew Brost wrote:
> > Add helper to queue TDR immediately for current and future jobs. This is
> > used in Xe, a new Intel GPU driver, to trigger a TDR to cleanup a
> > drm_scheduler that encounter errors.
> 
> I think the best (most optimal) thing to do is to remove the last sentence
> mentioning Xe. It is irrelevant to this patch. This patch is functional
> as is, and worth having it as is.
> 
> So it's best to have just:
> 
> 	Add a helper whereby a driver can invoke TDR immediately.
> 

+1.

> Also remove "for current and future jobs" from the title, as it is
> implied by how TDR works. We want to say less.
> 
> 	drm/sched: Add a helper to queue TDR immediately
>

Yep, my bad I forgot to adjust the commit message in this rev. Will fix.

Matt

> These are only GPU scheduler changes, worth having on their own. The fact
> that a new (future as of this moment) driver (Xe) would use them is irrelevant
> at the moment. Other drivers (new, current?) would most likely end up using the changes
> of these patches, and these changes go in on their own merit.
> 
> Regards,
> Luben
> 
> > 
> > v2:
> >  - Drop timeout args, rename function, use mod delayed work (Luben)
> > v3:
> >  - s/XE/Xe (Luben)
> >  - present tense in commit message (Luben)
> >  - Adjust comment for drm_sched_tdr_queue_imm (Luben)
> > 
> > Cc: Luben Tuikov <luben.tuikov@xxxxxxx>
> > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 18 +++++++++++++++++-
> >  include/drm/gpu_scheduler.h            |  1 +
> >  2 files changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index c4d5c3d265a8..f2846745b067 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -431,7 +431,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
> >  
> >  	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> >  	    !list_empty(&sched->pending_list))
> > -		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
> > +		mod_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
> >  }
> >  
> >  static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
> > @@ -441,6 +441,22 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
> >  	spin_unlock(&sched->job_list_lock);
> >  }
> >  
> > +/**
> > + * drm_sched_tdr_queue_imm: - immediately start job timeout handler
> > + *
> > + * @sched: scheduler for which the timeout handling should be started.
> > + *
> > + * Start timeout handling immediately for the named scheduler.
> > + */
> > +void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched)
> > +{
> > +	spin_lock(&sched->job_list_lock);
> > +	sched->timeout = 0;
> > +	drm_sched_start_timeout(sched);
> > +	spin_unlock(&sched->job_list_lock);
> > +}
> > +EXPORT_SYMBOL(drm_sched_tdr_queue_imm);
> > +
> >  /**
> >   * drm_sched_fault - immediately start timeout handler
> >   *
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 625ffe040de3..998b32b8d212 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -568,6 +568,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >  				    struct drm_gpu_scheduler **sched_list,
> >                                     unsigned int num_sched_list);
> >  
> > +void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
> >  void drm_sched_job_cleanup(struct drm_sched_job *job);
> >  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
> >  bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
>