Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > In what seems remarkably similar to the w/a required to not reload an > idle context with HEAD==TAIL, it appears we must prevent the HW from > switching to an idle context in ELSP[1], while simultaneously trying to > preempt the HW to run another context and a continuation of the idle > context (which is no longer idle). > > We can achieve this by preventing the context from completing while we > reload a new ELSP (by applying ring_set_paused(1) across the whole of > dequeue), except this eventually fails due to a lite-restore into a > waiting semaphore does not generate an ACK. Instead, we try to avoid > making the GPU do anything too challenging and not submit a new ELSP > while the interrupts + CSB events appear to have fallen behind the > completed contexts. We expect it to catch up shortly so we queue another > tasklet execution and hope for the best. > > Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501 > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 26 +++++++++++++++++++++++--- > 1 file changed, 23 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index b12355048501..5f17ece07858 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -1915,11 +1915,26 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > * of trouble. > */ > active = READ_ONCE(execlists->active); > - while ((last = *active) && i915_request_completed(last)) > - active++; > > - if (last) { > + /* > + * In theory we can skip over completed contexts that have not > + * yet been processed by events (as those events are in flight): > + * > + * while ((last = *active) && i915_request_completed(last)) > + * active++; > + * > + * However, the GPU is cannot handle this as it will ultimately s/is// I applaud the straightforward nature of this compared to the pausing. Albeit this seems to have a cost. But this should be quite rare event comparatively? > + * find itself trying to jump back into a context it has just > + * completed and barf. > + */ > + > + if ((last = *active)) { > if (need_preempt(engine, last, rb)) { > + if (i915_request_completed(last)) { > + tasklet_hi_schedule(&execlists->tasklet); > + return; > + } > + I was pondering of the lost tracing and if you can work it backwards to this condition. But I really hope this nails it, Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > ENGINE_TRACE(engine, > "preempting last=%llx:%lld, prio=%d, hint=%d\n", > last->fence.context, > @@ -1947,6 +1962,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > last = NULL; > } else if (need_timeslice(engine, last) && > timer_expired(&engine->execlists.timer)) { > + if (i915_request_completed(last)) { > + tasklet_hi_schedule(&execlists->tasklet); > + return; > + } > + > ENGINE_TRACE(engine, > "expired last=%llx:%lld, prio=%d, hint=%d\n", > last->fence.context, > -- > 2.20.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx