Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > Check that there was not a late recovery between us declaring the GPU > hung and processing the reset. If the GPU did recover by itself, let the > request remain on the active list and see if it hangs again! > Did you see this in action? Makes sense to recheck after reset. I don't remember how TDR will deal with multiple reset on the same engine but we should start tracking the seqno that cause it and make sure we don't get stuck by replaying the same. Do we check the banning on resubmission and/or do we trust that the breadcrumb update always succeedes? I envision that if we get multiple resets on same seqno, we just write the breadcrumbs through cpu and move on. But let's hope we don't need to and the gpu breadcrumps are always enough. Regardless, it's improvement and should weed out false positives on some hangs. Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> -Mika > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 0cae8acdf906..a89a88922448 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2589,6 +2589,9 @@ static void i915_gem_reset_engine(struct intel_engine_cs *engine) > return; > > ring_hung = engine->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG; > + if (engine->hangcheck.seqno != intel_engine_get_seqno(engine)) > + ring_hung = false; > + > i915_set_reset_status(request->ctx, ring_hung); > if (!ring_hung) > return; > -- > 2.9.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx