On Fri, Jan 08, 2016 at 04:54:19PM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > On Tue, Dec 01, 2015 at 05:56:12PM +0200, Mika Kuoppala wrote: > >> If head seems stuck and engine in question is rcs, > >> inspect subunit state transitions from undone to done, > >> before deciding that this really is a hang instead of limited > >> progress. Only account the transitions of subunits from > >> undone to done once, to prevent unstable subunit states > >> to keep us falsely active. > >> > >> As this adds one extra steps to hangcheck heuristics, > >> before hang is declared, it adds 1500ms to to detect hang > >> for render ring to a total of 7500ms. We could sample > >> the subunit states on first head stuck condition but > >> decide not to do so only in order to mimic old behaviour. This > >> way the check order of promotion from seqno > atchd > instdone > >> is consistently done. > >> > >> v2: Deal with unstable done states (Arun) > >> Clear instdone progress on head and seqno movement (Chris) > >> Report raw and accumulated instdone's in in debugfs (Chris) > >> Return HANGCHECK_ACTIVE on undone->done > >> > >> References: https://bugs.freedesktop.org/show_bug.cgi?id=93029 > >> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> Cc: Dave Gordon <david.s.gordon@xxxxxxxxx> > >> Cc: Daniel Vetter <daniel@xxxxxxxx> > >> Cc: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> > >> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > > > I feel slightly dubious in discarding the 1->0 transitions (as it just > > means that a shared function that was previously idle is now in use > > again), but if they truly do fluctuate randomly? then accumulating > > should mean we eventually escape. > > > > Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Queued for -next, thanks for the review. Hmm, you just reminded me that we have a problem with HEAD running wild now as we only detect a loop when it goes past 1<<48 (and we only increment the score when we loop). Something like: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index b2ef2d0c211b..4fe28a0301f2 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2949,21 +2949,15 @@ static enum intel_engine_hangcheck_action head_stuck(struct intel_engine_cs *ring, u64 acthd) { if (acthd != ring->hangcheck.acthd) { - /* Clear subunit states on head movement */ memset(ring->hangcheck.instdone, 0, sizeof(ring->hangcheck.instdone)); - if (acthd > ring->hangcheck.max_acthd) { - ring->hangcheck.max_acthd = acthd; - return HANGCHECK_ACTIVE; - } - return HANGCHECK_ACTIVE_LOOP; } if (!subunits_stuck(ring)) - return HANGCHECK_ACTIVE; + return HANGCHECK_ACTIVE_LOOP; return HANGCHECK_HUNG; } @@ -3117,7 +3111,9 @@ static void i915_hangcheck_elapsed(struct work_struct *work) * attempts across multiple batches. */ if (ring->hangcheck.score > 0) - ring->hangcheck.score--; + ring->hangcheck.score -= HUNG + if (ring->hangcheck.score < 0) + ring->hangcheck.score = 0; /* Clear head and subunit states on seqno movement */ ring->hangcheck.acthd = ring->hangcheck.max_acthd = 0; -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx