Quoting Tvrtko Ursulin (2020-07-16 16:37:25) > > On 16/07/2020 12:33, Chris Wilson wrote: > > One more complication of preempt-to-busy with respect to the virtual > > engine is that we may have retired the last request along the virtual > > engine at the same time as preparing to submit the completed request to > > a new engine. That submit will be shortcircuited, but not before we have > > updated the context with the new register offsets and marked the virtual > > engine as bound to the new engine (by calling swap on ve->siblings[]). > > As we may have just retired the completed request, we may also be in the > > middle of calling intel_context_exit() to turn off the power management > > virtual_context_exit > > > associated with the virtual engine, and that in turn walks the > > ve->siblings[]. If we happen to call swap() on the array as we walk, we > > will call intel_engine_pm_put() twice on the same engine. > > > > In this patch, we prevent this by only updating the bound engine after a > > successful submission which weeds out the already completed requests. > > > > Alternatively, we could walk a non-volatile array for the pm, such as > > using the engine->mask. The small advantage to performing the update > > after the submit is that we then only have to do a swap for active > > requests. > > > > Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") > > References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine" > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 65 ++++++++++++++++++----------- > > 1 file changed, 40 insertions(+), 25 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 88a5c155154d..5e8278e8ac79 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -1805,6 +1805,33 @@ static bool virtual_matches(const struct virtual_engine *ve, > > return true; > > } > > > > +static void virtual_xfer_context(struct virtual_engine *ve, > > + struct intel_engine_cs *engine) > > +{ > > + unsigned int n; > > + > > + if (likely(engine == ve->siblings[0])) > > + return; > > + > > + GEM_BUG_ON(READ_ONCE(ve->context.inflight)); > > + if (!intel_engine_has_relative_mmio(engine)) > > + virtual_update_register_offsets(ve->context.lrc_reg_state, > > + engine); > > + > > + /* > > + * Move the bound engine to the top of the list for > > + * future execution. We then kick this tasklet first > > + * before checking others, so that we preferentially > > + * reuse this set of bound registers. > > + */ > > + for (n = 1; n < ve->num_siblings; n++) { > > + if (ve->siblings[n] == engine) { > > + swap(ve->siblings[n], ve->siblings[0]); > > + break; > > + } > > + } > > +} > > + > > #define for_each_waiter(p__, rq__) \ > > list_for_each_entry_lockless(p__, \ > > &(rq__)->sched.waiters_list, \ > > @@ -2253,35 +2280,23 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > > GEM_BUG_ON(!(rq->execution_mask & engine->mask)); > > WRITE_ONCE(rq->engine, engine); > > > > - if (engine != ve->siblings[0]) { > > - u32 *regs = ve->context.lrc_reg_state; > > - unsigned int n; > > - > > - GEM_BUG_ON(READ_ONCE(ve->context.inflight)); > > - > > - if (!intel_engine_has_relative_mmio(engine)) > > - virtual_update_register_offsets(regs, > > - engine); > > - > > + if (__i915_request_submit(rq)) { > > /* > > - * Move the bound engine to the top of the list > > - * for future execution. We then kick this > > - * tasklet first before checking others, so that > > - * we preferentially reuse this set of bound > > - * registers. > > + * Only after we confirm that we will submit > > + * this request (i.e. it has not already > > + * completed), do we want to update the context. > > + * > > + * This serves two purposes. It avoids > > + * unnecessary work if we are resubmitting an > > + * already completed request after timeslicing. > > + * But more importantly, it prevents us altering > > + * ve->siblings[] on an idle context, where > > + * we may be using ve->siblings[] in > > + * virtual_context_enter / virtual_context_exit. > > */ > > - for (n = 1; n < ve->num_siblings; n++) { > > - if (ve->siblings[n] == engine) { > > - swap(ve->siblings[n], > > - ve->siblings[0]); > > - break; > > - } > > - } > > - > > + virtual_xfer_context(ve, engine); > > GEM_BUG_ON(ve->siblings[0] != engine); > > - } > > > > - if (__i915_request_submit(rq)) { > > submit = true; > > last = rq; > > } > > > > This was nasty indeed. How did you manage to find this? Staring the traces, playing games of what-if. The clue was a bunch of retirement and parking (too early due to the bad wakeref count) around the very occasional bouncing virtual engine. > I think it is okay to do the update once we know submit is doing ahead, > but in the light of this "gotcha" I think it would also be good to start > using for_each_engine_masked in virtual_context_enter/exit. I am tempted, but I want to avoid further reliance on ve->base.gt. That may be a moot point already, but I hope not. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx