Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > Before a reset, we set the STOP_RING bit of RING_MI_MODE to freeze the > engine. However, sometimes we observe that upon restart, the engine > freezes again with the STOP_RING bit still asserted. By inspection, we > know that the register itself is cleared by the GPU reset, so that bit > must be preserved inside the context image and reloaded from there. A > simple fix (as the RING_MI_MODE is at a fixed offset in a valid context) > is to clobber the STOP_RING bit inside the image before replaying the > request. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Michel Thierry <michel.thierry@xxxxxxxxx> > Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > --- > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++-- > drivers/gpu/drm/i915/intel_lrc_reg.h | 2 ++ > 2 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 3f90c74038ef..37fe842de639 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1918,6 +1918,20 @@ static void execlists_reset(struct intel_engine_cs *engine, > > spin_unlock_irqrestore(&engine->timeline.lock, flags); > > + if (!request) > + return; > + > + regs = request->hw_context->lrc_reg_state; > + > + /* > + * After a reset, the context may have preserved the STOP bit > + * of RING_MI_MODE we used to freeze the active engine before > + * the reset. If that bit is restored the ring stops instead > + * of being executed. > + */ > + regs[CTX_MI_MODE + 1] |= STOP_RING << 16; > + regs[CTX_MI_MODE + 1] &= ~STOP_RING; Ok, I did go and pondered if this is truely the simplest way. Forcing the start outside a context would not be simpler and would be slower. Nice find and it will has potential to explain our some troubles of re-enabling engines. Did you find out this by looking at the error states or what? Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > + > /* > * If the request was innocent, we leave the request in the ELSP > * and will try to replay it on restarting. The context image may > @@ -1929,7 +1943,7 @@ static void execlists_reset(struct intel_engine_cs *engine, > * and have to at least restore the RING register in the context > * image back to the expected values to skip over the guilty request. > */ > - if (!request || request->fence.error != -EIO) > + if (request->fence.error != -EIO) > return; > > /* > @@ -1940,7 +1954,6 @@ static void execlists_reset(struct intel_engine_cs *engine, > * future request will be after userspace has had the opportunity > * to recreate its own state. > */ > - regs = request->hw_context->lrc_reg_state; > if (engine->pinned_default_state) { > memcpy(regs, /* skip restoring the vanilla PPHWSP */ > engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE, > diff --git a/drivers/gpu/drm/i915/intel_lrc_reg.h b/drivers/gpu/drm/i915/intel_lrc_reg.h > index 5ef932d810a7..3b155ecbfa92 100644 > --- a/drivers/gpu/drm/i915/intel_lrc_reg.h > +++ b/drivers/gpu/drm/i915/intel_lrc_reg.h > @@ -39,6 +39,8 @@ > #define CTX_R_PWR_CLK_STATE 0x42 > #define CTX_END 0x44 > > +#define CTX_MI_MODE 0x54 > + > #define CTX_REG(reg_state, pos, reg, val) do { \ > u32 *reg_state__ = (reg_state); \ > const u32 pos__ = (pos); \ > -- > 2.18.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx