Re: [RFC 1/3] drm/i915: Watchdog timeout: IRQ handler for gen8+

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Thu, 23 Feb 2017 23:38:07 +0000

On Thu, Feb 23, 2017 at 08:57:54PM +0000, Chris Wilson wrote:
> On Thu, Feb 23, 2017 at 11:44:17AM -0800, Michel Thierry wrote:
> > +
> > +		/* Make sure the active request will be marked as guilty */
> > +		engine->hangcheck.stalled = true;
> > +		engine->hangcheck.seqno = intel_engine_get_seqno(engine);
> 
> Just set a flag saying the engine->hangcheck.watchdog = true. Don't
> confuse us. engine->hangcheck.seqno does not give the guilty seqno!

Hmm. So I was expecting a little more work on hangcheck. Once we kick
hangcheck from watchdog, we need to confirm that the active seqno hasn't
advance and ideally we would stop the ring before it does. As it stands
hangcheck.seqno will at least detect when we fail to reset in time
(still has the smell of dos as both the watchdog timeout and the request
duration are under user control), but I'm still uncomfortable with it
being set outside of the timer based hangcheck - at least not unless we
split the different seqno:

engine->hangcheck.watchdog_seqno
engine->hangcheck.timer_seqno
engine->hangcheck.stalled // now a seqno from either path

We also need to teach the standard hangcheck/i915_reset to only reset
selected engines.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx