On 2023-10-27 12:41, Boris Brezillon wrote: > On Fri, 27 Oct 2023 10:32:52 -0400 > Luben Tuikov <ltuikov89@xxxxxxxxx> wrote: > >> On 2023-10-27 04:25, Boris Brezillon wrote: >>> Hi Danilo, >>> >>> On Thu, 26 Oct 2023 18:13:00 +0200 >>> Danilo Krummrich <dakr@xxxxxxxxxx> wrote: >>> >>>> Currently, job flow control is implemented simply by limiting the number >>>> of jobs in flight. Therefore, a scheduler is initialized with a credit >>>> limit that corresponds to the number of jobs which can be sent to the >>>> hardware. >>>> >>>> This implies that for each job, drivers need to account for the maximum >>>> job size possible in order to not overflow the ring buffer. >>>> >>>> However, there are drivers, such as Nouveau, where the job size has a >>>> rather large range. For such drivers it can easily happen that job >>>> submissions not even filling the ring by 1% can block subsequent >>>> submissions, which, in the worst case, can lead to the ring run dry. >>>> >>>> In order to overcome this issue, allow for tracking the actual job size >>>> instead of the number of jobs. Therefore, add a field to track a job's >>>> credit count, which represents the number of credits a job contributes >>>> to the scheduler's credit limit. >>>> >>>> Signed-off-by: Danilo Krummrich <dakr@xxxxxxxxxx> >>>> --- >>>> Changes in V2: >>>> ============== >>>> - fixed up influence on scheduling fairness due to consideration of a job's >>>> size >>>> - If we reach a ready entity in drm_sched_select_entity() but can't actually >>>> queue a job from it due to size limitations, just give up and go to sleep >>>> until woken up due to a pending job finishing, rather than continue to try >>>> other entities. >>>> - added a callback to dynamically update a job's credits (Boris) >>> >>> This callback seems controversial. I'd suggest dropping it, so the >>> patch can be merged. >> >> Sorry, why is it controversial? (I did read back-and-forth above, but it wasn't clear >> why it is /controversial/.) > > That's a question for Christian, I guess. I didn't quite get what he > was worried about, other than this hook introducing a new way for > drivers to screw things up by returning funky/invalid credits (which we It's up to the driver--they can test, test, test, and fix their code and so on. We can only do so much and shouldn't be baby-sitting drivers ad nauseam. Drivers can also not define this callback. :-) > can report with WARN_ON()s). But let's be honest, there's probably a > hundred different ways (if not more) drivers can shoot themselves in the > foot with drm_sched already... Yes, that's true. So there's no worries with this hook. > >> >> I believe only drivers are privy to changes in the credit availability as their >> firmware and hardware executes new jobs and finishes others, and so this "update" >> here is essential--leaving it only to prepare_job() wouldn't quite fulfill the vision >> of why the credit mechanism introduced by this patch in the first place. > > I kinda agree with you, even if I wouldn't so pessimistic as to how > useful this patch would be without the ->update_job_credits() hook > (it already makes the situation a lot better for Nouveau and probably > other drivers too). Sure, and that's a good thing. The heart of the dynamic credit scheme this patch is introducing *is* update_job_credits() callback. Without it, it's just about like the current job flow-control scheme we have with varied job weights (credits). Remember, it is an optional callback and driver can choose NOT to define it--simple. :-) So, I'm very excited about this, and see a wide range of applications and tricks drivers can do with the credit scheme (albeit had it been an "int" bwha-ha-ha-ha ]:-> ). Have a good weekend everyone! -- Regards, Luben
Attachment:
OpenPGP_0x4C15479431A334AF.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature