Quoting Thomas Hellström (Intel) (2020-06-16 10:07:28) > Hi, Chris, > > Some comments and questions: > > On 6/8/20 12:21 AM, Chris Wilson wrote: > > The first "scheduler" was a topographical sorting of requests into > > priority order. The execution order was deterministic, the earliest > > submitted, highest priority request would be executed first. Priority > > inherited ensured that inversions were kept at bay, and allowed us to > > dynamically boost priorities (e.g. for interactive pageflips). > > > > The minimalistic timeslicing scheme was an attempt to introduce fairness > > between long running requests, by evicting the active request at the end > > of a timeslice and moving it to the back of its priority queue (while > > ensuring that dependencies were kept in order). For short running > > requests from many clients of equal priority, the scheme is still very > > much FIFO submission ordering, and as unfair as before. > > > > To impose fairness, we need an external metric that ensures that clients > > are interpersed, we don't execute one long chain from client A before > > executing any of client B. This could be imposed by the clients by using > > a fences based on an external clock, that is they only submit work for a > > "frame" at frame-interval, instead of submitting as much work as they > > are able to. The standard SwapBuffers approach is akin to double > > bufferring, where as one frame is being executed, the next is being > > submitted, such that there is always a maximum of two frames per client > > in the pipeline. Even this scheme exhibits unfairness under load as a > > single client will execute two frames back to back before the next, and > > with enough clients, deadlines will be missed. > > > > The idea introduced by BFS/MuQSS is that fairness is introduced by > > metering with an external clock. Every request, when it becomes ready to > > execute is assigned a virtual deadline, and execution order is then > > determined by earliest deadline. Priority is used as a hint, rather than > > strict ordering, where high priority requests have earlier deadlines, > > but not necessarily earlier than outstanding work. Thus work is executed > > in order of 'readiness', with timeslicing to demote long running work. > > > > The Achille's heel of this scheduler is its strong preference for > > low-latency and favouring of new queues. Whereas it was easy to dominate > > the old scheduler by flooding it with many requests over a short period > > of time, the new scheduler can be dominated by a 'synchronous' client > > that waits for each of its requests to complete before submitting the > > next. As such a client has no history, it is always considered > > ready-to-run and receives an earlier deadline than the long running > > requests. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 12 +- > > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 1 + > > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 4 +- > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 24 -- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 328 +++++++----------- > > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 5 +- > > drivers/gpu/drm/i915/gt/selftest_lrc.c | 43 ++- > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 +- > > drivers/gpu/drm/i915/i915_priolist_types.h | 7 +- > > drivers/gpu/drm/i915/i915_request.h | 4 +- > > drivers/gpu/drm/i915/i915_scheduler.c | 322 ++++++++++++----- > > drivers/gpu/drm/i915/i915_scheduler.h | 22 +- > > drivers/gpu/drm/i915/i915_scheduler_types.h | 17 + > > .../drm/i915/selftests/i915_mock_selftests.h | 1 + > > drivers/gpu/drm/i915/selftests/i915_request.c | 1 + > > .../gpu/drm/i915/selftests/i915_scheduler.c | 49 +++ > > 16 files changed, 484 insertions(+), 362 deletions(-) > > create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c > > Do we have timings to back this change up? Would it make sense to have a > configurable scheduler choice? gem_wsim workloads with different load balancers, varying the number of clients, % variation from previous patch. +mB--------------------------------------------------------------------+ | a | | cda | | c.a | | ..aa | | ..---. | | -.--+-. | | .c.-.-+++. b | | b bb.d-c-+--+++.aab aa b b | |b b b b b. b ..---+++-+++++....a. b. b b b b b b| | A| | | |___AM____| | | |A__| | | |MA_| | +----------------------------------------------------------------------+ Clients N Min Max Median Avg Stddev 1 63 -8.2 5.4 -0.045 -0.02375 0.094722134 2 63 -15.96 19.28 -0.64 -1.05 2.2428076 4 63 -5.11 2.95 -1.15 -1.0683333 0.72382651 8 63 -5.63 1.85 -0.905 -0.87122449 0.73390971 The wildest swings there do appear to be a result of interrupt latency, with the -1% impact from execution order and more context switching. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx