Quoting Tvrtko Ursulin (2020-10-14 10:06:11) > > On 14/10/2020 09:43, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2020-10-14 09:36:08) > >> > >> On 13/10/2020 16:35, Chris Wilson wrote: > >>> Repeat our sanitychecks from before execution to after execution. One > >>> expects that if we were to see these, the gpu would already be on fire, > >>> but the timing may be informative. > >>> > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 10 +++++++--- > >>> 1 file changed, 7 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> index 287537089c77..3dbdd5d0cb60 100644 > >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> @@ -1216,7 +1216,8 @@ static void intel_engine_context_out(struct intel_engine_cs *engine) > >>> > >>> static void > >>> execlists_check_context(const struct intel_context *ce, > >>> - const struct intel_engine_cs *engine) > >>> + const struct intel_engine_cs *engine, > >>> + const char *when) > >>> { > >>> const struct intel_ring *ring = ce->ring; > >>> u32 *regs = ce->lrc_reg_state; > >>> @@ -1251,7 +1252,7 @@ execlists_check_context(const struct intel_context *ce, > >>> valid = false; > >>> } > >>> > >>> - WARN_ONCE(!valid, "Invalid lrc state found before submission\n"); > >>> + WARN_ONCE(!valid, "Invalid lrc state found %s submission\n", when); > >>> } > >>> > >>> static void restore_default_state(struct intel_context *ce, > >>> @@ -1347,7 +1348,7 @@ __execlists_schedule_in(struct i915_request *rq) > >>> reset_active(rq, engine); > >>> > >>> if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) > >>> - execlists_check_context(ce, engine); > >>> + execlists_check_context(ce, engine, "before"); > >>> > >>> if (ce->tag) { > >>> /* Use a fixed tag for OA and friends */ > >>> @@ -1418,6 +1419,9 @@ __execlists_schedule_out(struct i915_request *rq, > >>> * refrain from doing non-trivial work here. > >>> */ > >>> > >>> + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) > >>> + execlists_check_context(ce, engine, "after"); > >>> + > >> > >> CI failures here are either something super scary or a simple mistake > >> which I cannot see. Or is engine retire, possible queued up before, > >> racing with current schedule_out? > > > > It's the unpark while the process_csb is not yet flushed, so we scrub > > the kernel_context before it is scheduled-out. It could in theory be a > > real problem with our scrubbing to simulate an issue causing an issue, > > but the timing is quite slim. > > Unpark with unflushed process_csb? I thought maybe you meant park, but > poisoning is indeed in unpark. Put pending process_csb means engine is > supposed to be unparked already. Or you are saying it went through the > parked-unparked cycle all with pending process_csb? Yes. A pending CSB has a GT wakeref (for the interrupt) not an engine wakeref, which boils down to that we use the engine parking to force the context switch with one last submission. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx