Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow control

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Wed, 27 Sep 2023 09:25:14 +0200

On Wed, 27 Sep 2023 02:13:59 +0200
Danilo Krummrich <dakr@xxxxxxxxxx> wrote:

> On 9/26/23 22:43, Luben Tuikov wrote:
> > Hi,
> > 
> > On 2023-09-24 18:43, Danilo Krummrich wrote:  
> >> Currently, job flow control is implemented simply by limiting the amount
> >> of jobs in flight. Therefore, a scheduler is initialized with a
> >> submission limit that corresponds to a certain amount of jobs.  
> > 
> > "certain"? How about this instead:
> > " ... that corresponds to the number of jobs which can be sent
> >    to the hardware."?
> >   
> >>
> >> This implies that for each job drivers need to account for the maximum  
> >                                  ^,
> > Please add a comma after "job".
> >   
> >> job size possible in order to not overflow the ring buffer.  
> > 
> > Well, different hardware designs would implement this differently.
> > Ideally, you only want pointers into the ring buffer, and then
> > the hardware consumes as much as it can. But this is a moot point
> > and it's always a good idea to have a "job size" hint from the client.
> > So this is a good patch.
> > 
> > Ideally, you want to say that the hardware needs to be able to
> > accommodate the number of jobs which can fit in the hardware
> > queue times the largest job. This is a waste of resources
> > however, and it is better to give a hint as to the size of a job,
> > by the client. If the hardware can peek and understand dependencies,
> > on top of knowing the "size of the job", it can be an extremely
> > efficient scheduler.
> >   
> >>
> >> However, there are drivers, such as Nouveau, where the job size has a
> >> rather large range. For such drivers it can easily happen that job
> >> submissions not even filling the ring by 1% can block subsequent
> >> submissions, which, in the worst case, can lead to the ring run dry.
> >>
> >> In order to overcome this issue, allow for tracking the actual job size
> >> instead of the amount job jobs. Therefore, add a field to track a job's  
> > 
> > "the amount job jobs." --> "the number of jobs."  
> 
> Yeah, I somehow manage to always get this wrong, which I guess you noticed
> below already.
> 
> That's all good points below - gonna address them.
> 
> Did you see Boris' response regarding a separate callback in order to fetch
> the job's submission units dynamically? Since this is needed by PowerVR, I'd
> like to include this in V2. What's your take on that?
> 
> My only concern with that would be that if I got what Boris was saying
> correctly calling
> 
> WARN_ON(s_job->submission_units > sched->submission_limit);
> 
> from drm_sched_can_queue() wouldn't work anymore, since this could indeed happen
> temporarily. I think this was also Christian's concern.

Actually, I think that's fine to account for the max job size in the
first check, we're unlikely to have so many native fence waits that our
job can't fit in an empty ring buffer.