> -----Original Message----- > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Sent: Friday, September 20, 2019 9:04 AM > To: Bloomfield, Jon <jon.bloomfield@xxxxxxxxx>; intel- > gfx@xxxxxxxxxxxxxxxxxxxxx; Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > Subject: RE: [PATCH] drm/i915: Prevent bonded requests from > overtaking each other on preemption > > Quoting Bloomfield, Jon (2019-09-20 16:50:57) > > > -----Original Message----- > > > From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > Tvrtko > > > Ursulin > > > Sent: Friday, September 20, 2019 8:12 AM > > > To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > > Subject: Re: [PATCH] drm/i915: Prevent bonded requests from > > > overtaking each other on preemption > > > > > > > > > On 20/09/2019 15:57, Chris Wilson wrote: > > > > Quoting Chris Wilson (2019-09-20 09:36:24) > > > >> Force bonded requests to run on distinct engines so that they cannot be > > > >> shuffled onto the same engine where timeslicing will reverse the order. > > > >> A bonded request will often wait on a semaphore signaled by its master, > > > >> creating an implicit dependency -- if we ignore that implicit dependency > > > >> and allow the bonded request to run on the same engine and before its > > > >> master, we will cause a GPU hang. > > > > > > > > Thinking more, it should not directly cause a GPU hang, as the stuck > request > > > > should be timesliced away, and each preemption should be enough to > keep > > > > hangcheck at bay (though we have evidence it may not). So at best it runs > > > > at half-speed, at worst a third (if my model is correct). > > > > > > But I think it is still correct to do since we don't have the coupling > > > information on re-submit. Hm.. but don't we need to prevent slave from > > > changing engines as well? > > > > Unless I'm missing something, the proposal here is to set the engines in stone > at first submission, and never change them? > > For submission here, think execution (submission to actual HW). (We have > 2 separate phases that all like to be called submit()!) > > > If so, that does sound overly restrictive, and will prevent any kind of > rebalancing as workloads (of varying slave counts) come and go. > > We are only restricting this request, not the contexts. We still have > balancing overall, just not instantaneous balancing if we timeslice out > of this request -- we put it back onto the "same" engine and not another. > Which is in some ways is less than ideal, although strictly we are only > saying don't put it back onto an engine we have earmarked for our bonded > request, and so we avoid contending with our parallel request reducing > that to serial (and often bad) behaviour. > > [So at the end of this statement, I'm more happy with the restriction ;] > > > During the original design it was called out that the workloads should be pre- > empted atomically. That allows the entire bonding mask to be re-evaluated at > every context switch and so we can then rebalance. Still not easy to achieve I > agree :-( > > The problem with that statement is that atomic implies a global > scheduling decision. Blood, sweat and tears. Agreed - It isn't fun. Perhaps it doesn't matter anyway. Once GuC is offloading the scheduling it should be able to do a little more wrt rebalancing. Let's make it a GuC headache instead. > > Of course, with your endless scheme, scheduling is all in the purview of > the user :) Hey, don't tarnish me with that brush. I don't like it either. Actually, it's your scheme technically. I just asked for a way to enable HPC workloads, and you enthusiastically offered heartbeats&non-persistence. So shall history be written :-) > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx