On gen9, we see an effect where when we perform an element switch just as the first context completes execution that switch takes twice as long, as if it first reloads the completed context. That is we observe the cost of context1 -> idle -> context1 -> context2 as being twice the cost of the same operation as on gen8. The impact of this is incredibly rare outside of microbenchmarks that are focused on assessing the throughput of context switches. Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> --- I think is a microbenchmark too far, as there is no real world impact of this as both the likelihood of submission at that precise point of time, and the context switch being a significant fraction of the batch runtime make the effect miniscule in practise. It is also not foolproof for even gem_ctx_switch: kbl ctx1 -> idle -> ctx2: ~25us; ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~53us ctx1 -> idle -> ctx1 -> ctx2 (patched): 30-40us bxt ctx1 -> idle -> ctx2: ~40us ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~80 ctx1 -> idle -> ctx1 -> ctx2 (patched): 60-70us So consider this as more of a plea for ideas; why does bdw behaviour better? Are we missing a flag, a fox or a chicken? -Chris --- drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 36050f085071..682268d4249d 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -711,6 +711,24 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(last->hw_context == rq->hw_context); + /* + * Avoid reloading the previous context if we + * know it has just completed and we want + * to switch over to a new context. The CS + * interrupt is likely waiting for us to + * release the local irq lock and so we will + * proceed with the submission momentarily, + * which is quicker than reloading the context + * on the gpu. + */ + if (!submit && + intel_engine_signaled(engine, + last->global_seqno)) { + GEM_BUG_ON(!list_is_first(&rq->sched.link, + &p->requests)); + return; + } + if (submit) port_assign(port, last); port++; -- 2.18.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx