On Tue, Oct 15, 2024 at 08:27:10PM +0530, Nitin Gote wrote: > we see an issue where resets fails because the engine resumes > from an incorrect RING_HEAD. Since the RING_HEAD doesn't point > to the remaining requests to re-run, but may instead point into > the uninitialised portion of the ring, the GPU may be then fed > invalid instructions from a privileged context, oft pushing the > GPU into an unrecoverable hang. > > If at first the write doesn't succeed, try, try again. > > v2: Avoid unnecessary timeout macro (Andi) > > v3: Correct comment format (Andi) > > v4: Make it generic for all platform as it won't impact (Chris) > > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/5432 > Testcase: igt/i915_selftest/hangcheck The referenced HSW-specific gitlab issue was closed in 2022 and hadn't been active for a while before that. This patch from Chris was originally posted as an attachment on that gitlab issue asking if it helped, but nobody responded that it did/didn't improve the situation so it may or may not have been relevant to what was originally reported in that ticket. Looking in cibuglog, the most similar failures I see today are the ones getting associated with issue #12310. I.e., <3> [220.415493] i915 0000:00:02.0: [drm] *ERROR* failed to set rcs0 head to zero ctl 00000000 head 00001db8 tail 00000000 start 7fffa000 Are you trying to solve that CI issue or is there a different user-submitted report somewhere that this patch is trying to address? Matt > Signed-off-by: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx> > Signed-off-by: Nitin Gote <nitin.r.gote@xxxxxxxxx> > --- > .../gpu/drm/i915/gt/intel_ring_submission.c | 31 ++++++++++++++++--- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c > index 72277bc8322e..b6b25fe22cb8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c > @@ -192,6 +192,7 @@ static bool stop_ring(struct intel_engine_cs *engine) > static int xcs_resume(struct intel_engine_cs *engine) > { > struct intel_ring *ring = engine->legacy.ring; > + ktime_t kt; > > ENGINE_TRACE(engine, "ring:{HEAD:%04x, TAIL:%04x}\n", > ring->head, ring->tail); > @@ -230,9 +231,27 @@ static int xcs_resume(struct intel_engine_cs *engine) > set_pp_dir(engine); > > /* First wake the ring up to an empty/idle ring */ > - ENGINE_WRITE_FW(engine, RING_HEAD, ring->head); > + for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC); > + ktime_before(ktime_get(), (kt)); cpu_relax()) { > + /* > + * In case of resets fails because engine resumes from > + * incorrect RING_HEAD and then GPU may be then fed > + * to invalid instrcutions, which may lead to unrecoverable > + * hang. So at first write doesn't succeed then try again. > + */ > + ENGINE_WRITE_FW(engine, RING_HEAD, ring->head); > + if (ENGINE_READ_FW(engine, RING_HEAD) == ring->head) > + break; > + } > + > ENGINE_WRITE_FW(engine, RING_TAIL, ring->head); > - ENGINE_POSTING_READ(engine, RING_TAIL); > + if (ENGINE_READ_FW(engine, RING_HEAD) != ENGINE_READ_FW(engine, RING_TAIL)) { > + ENGINE_TRACE(engine, "failed to reset empty ring: [%x, %x]: %x\n", > + ENGINE_READ_FW(engine, RING_HEAD), > + ENGINE_READ_FW(engine, RING_TAIL), > + ring->head); > + goto err; > + } > > ENGINE_WRITE_FW(engine, RING_CTL, > RING_CTL_SIZE(ring->size) | RING_VALID); > @@ -241,12 +260,16 @@ static int xcs_resume(struct intel_engine_cs *engine) > if (__intel_wait_for_register_fw(engine->uncore, > RING_CTL(engine->mmio_base), > RING_VALID, RING_VALID, > - 5000, 0, NULL)) > + 5000, 0, NULL)) { > + ENGINE_TRACE(engine, "failed to restart\n"); > goto err; > + } > > - if (GRAPHICS_VER(engine->i915) > 2) > + if (GRAPHICS_VER(engine->i915) > 2) { > ENGINE_WRITE_FW(engine, > RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING)); > + ENGINE_POSTING_READ(engine, RING_MI_MODE); > + } > > /* Now awake, let it get started */ > if (ring->tail != ring->head) { > -- > 2.25.1 > -- Matt Roper Graphics Software Engineer Linux GPU Platform Enablement Intel Corporation