On Thu, Jan 07, 2021 at 04:45:31PM +0000, Chris Wilson wrote: > Quoting Matthew Brost (2021-01-07 16:05:07) > > On Tue, Dec 29, 2020 at 12:01:25PM +0000, Chris Wilson wrote: > > > The first "scheduler" was a topographical sorting of requests into > > > priority order. The execution order was deterministic, the earliest > > > submitted, highest priority request would be executed first. Priority > > > inheritance ensured that inversions were kept at bay, and allowed us to > > > dynamically boost priorities (e.g. for interactive pageflips). > > > > > > The minimalistic timeslicing scheme was an attempt to introduce fairness > > > between long running requests, by evicting the active request at the end > > > of a timeslice and moving it to the back of its priority queue (while > > > ensuring that dependencies were kept in order). For short running > > > requests from many clients of equal priority, the scheme is still very > > > much FIFO submission ordering, and as unfair as before. > > > > > > To impose fairness, we need an external metric that ensures that clients > > > are interpersed, so we don't execute one long chain from client A before > > > executing any of client B. This could be imposed by the clients > > > themselves by using fences based on an external clock, that is they only > > > submit work for a "frame" at frame-intervals, instead of submitting as > > > much work as they are able to. The standard SwapBuffers approach is akin > > > to double bufferring, where as one frame is being executed, the next is > > > being submitted, such that there is always a maximum of two frames per > > > client in the pipeline and so ideally maintains consistent input-output > > > latency. Even this scheme exhibits unfairness under load as a single > > > client will execute two frames back to back before the next, and with > > > enough clients, deadlines will be missed. > > > > > > The idea introduced by BFS/MuQSS is that fairness is introduced by > > > metering with an external clock. Every request, when it becomes ready to > > > execute is assigned a virtual deadline, and execution order is then > > > determined by earliest deadline. Priority is used as a hint, rather than > > > strict ordering, where high priority requests have earlier deadlines, > > > but not necessarily earlier than outstanding work. Thus work is executed > > > in order of 'readiness', with timeslicing to demote long running work. > > > > > > The Achille's heel of this scheduler is its strong preference for > > > low-latency and favouring of new queues. Whereas it was easy to dominate > > > the old scheduler by flooding it with many requests over a short period > > > of time, the new scheduler can be dominated by a 'synchronous' client > > > that waits for each of its requests to complete before submitting the > > > next. As such a client has no history, it is always considered > > > ready-to-run and receives an earlier deadline than the long running > > > requests. This is compensated for by refreshing the current execution's > > > deadline and by disallowing preemption for timeslice shuffling. > > > > > > To check the impact on throughput (often the downfall of latency > > > sensitive schedulers), we used gem_wsim to simulate various transcode > > > workloads with different load balancers, and varying the number of > > > competing [heterogenous] clients. > > > > > > +delta%------------------------------------------------------------------+ > > > | a | > > > | a | > > > | aa | > > > | aa | > > > | aa | > > > | aa | > > > | aaa | > > > | aaaa | > > > | a aaaaa | > > > | a aaaaaa | > > > |a aa a aaaaaaaaaa aa a a| > > > | A_| | > > > +------------------------------------------------------------------------+ > > > N Min Max Median Avg Stddev > > > 108 -23.982194 28.421527 -0.077474828 -0.072650418 0.16179718 > > > > > > The impact was on average 0.1% under contention due to the change in > > > context execution order and number of context switches. The biggest > > > swings are due to the execution ordering favouring one client or another, > > > and maybe room for improvement. > > > > > > > I haven't dug into this series deeply but it does seem plausible for > > execlist submission. However this new scheduler seems completely broken > > for Guc submission. > > Underneath it's the same topological sort, just with a different key. > At worst, that key can be the plain priority again. But that's pretty > much immaterial in terms of deciding the order which requests are > transferred from one queue to another. Execution order wants to be based > on resource prioritisation (beyond request priorisation), ideally we > would keep our fine grained scheduling control. > So maybe just hook for the key? > I'm not convinced by the state of the fw's scheduler... > Can't really argue here, but I think it would be really helpful if you raised your concerns with the fw team so that can be addressed. > > The scheduler really isn't used that much with GuC > > submission but when it is the old one works perfectly. The vfunc for > > the scheduler is deleted in patch #25. I don't think that is good idea > > as it appears we are trending to 2 separate schedulers. > > You still use the i915_request_set_priority() entry point as the > caller into the backend bypass. > I thought we want to avoid backend specific branches, hence vfuncs. > Isn't the current guc.fw a second closed driver for the same HW anyway? > Not sure I follow this. > > Lastly this series seems to be on the wrong list, I believe it should be > > posted elsewhere first. > > It's the right place for upstreaming bug fixes on stable platforms. And > after a year of being here, that's a moot point. Wow, I feel your pain if this has been out here for a year. But still it is wrong place. I can help and will help with the reviews if it is in the correct spot. Matt > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx