On Fri, 27 Oct 2023 10:32:52 -0400 Luben Tuikov <ltuikov89@xxxxxxxxx> wrote: > On 2023-10-27 04:25, Boris Brezillon wrote: > > Hi Danilo, > > > > On Thu, 26 Oct 2023 18:13:00 +0200 > > Danilo Krummrich <dakr@xxxxxxxxxx> wrote: > > > >> Currently, job flow control is implemented simply by limiting the number > >> of jobs in flight. Therefore, a scheduler is initialized with a credit > >> limit that corresponds to the number of jobs which can be sent to the > >> hardware. > >> > >> This implies that for each job, drivers need to account for the maximum > >> job size possible in order to not overflow the ring buffer. > >> > >> However, there are drivers, such as Nouveau, where the job size has a > >> rather large range. For such drivers it can easily happen that job > >> submissions not even filling the ring by 1% can block subsequent > >> submissions, which, in the worst case, can lead to the ring run dry. > >> > >> In order to overcome this issue, allow for tracking the actual job size > >> instead of the number of jobs. Therefore, add a field to track a job's > >> credit count, which represents the number of credits a job contributes > >> to the scheduler's credit limit. > >> > >> Signed-off-by: Danilo Krummrich <dakr@xxxxxxxxxx> > >> --- > >> Changes in V2: > >> ============== > >> - fixed up influence on scheduling fairness due to consideration of a job's > >> size > >> - If we reach a ready entity in drm_sched_select_entity() but can't actually > >> queue a job from it due to size limitations, just give up and go to sleep > >> until woken up due to a pending job finishing, rather than continue to try > >> other entities. > >> - added a callback to dynamically update a job's credits (Boris) > > > > This callback seems controversial. I'd suggest dropping it, so the > > patch can be merged. > > Sorry, why is it controversial? (I did read back-and-forth above, but it wasn't clear > why it is /controversial/.) That's a question for Christian, I guess. I didn't quite get what he was worried about, other than this hook introducing a new way for drivers to screw things up by returning funky/invalid credits (which we can report with WARN_ON()s). But let's be honest, there's probably a hundred different ways (if not more) drivers can shoot themselves in the foot with drm_sched already... > > I believe only drivers are privy to changes in the credit availability as their > firmware and hardware executes new jobs and finishes others, and so this "update" > here is essential--leaving it only to prepare_job() wouldn't quite fulfill the vision > of why the credit mechanism introduced by this patch in the first place. I kinda agree with you, even if I wouldn't so pessimistic as to how useful this patch would be without the ->update_job_credits() hook (it already makes the situation a lot better for Nouveau and probably other drivers too).