On Wed, Apr 19, 2017 at 05:09:46PM +0300, Mika Kuoppala wrote: > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > If we have a long period of idleness, we turn off the hangcheck timer > > and stop polling the hardware. Before we restart the hangcheck, we > > should clear the previous timestamps to prevent us thinking that the > > engine was stalled for a long time, if the seqno were manipulated > > carefully (such as the repeating patterns in gem_exec_whisper). > > > > It should have no impact upon normal use. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_hangcheck.c | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c > > index b0ca0c4c70d9..a74decca5109 100644 > > --- a/drivers/gpu/drm/i915/intel_hangcheck.c > > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c > > @@ -409,13 +409,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > > int busy_count = 0; > > > > if (!i915.enable_hangcheck) > > - return; > > + goto disarm_hangcheck; > > > > if (!READ_ONCE(dev_priv->gt.awake)) > > - return; > > + goto disarm_hangcheck; > > > > if (i915_terminally_wedged(&dev_priv->gpu_error)) > > - return; > > + goto disarm_hangcheck; > > > > /* As enabling the GPU requires fairly extensive mmio access, > > * periodically arm the mmio checker to see if we are triggering > > @@ -446,8 +446,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > > hangcheck_declare_hang(dev_priv, hung, stuck); > > > > /* Reset timer in case GPU hangs without another request being added */ > > - if (busy_count) > > + if (busy_count) { > > i915_queue_hangcheck(dev_priv); > > Now if we don't have a waiter, we always init hangcheck. And thus > we never detect a hang if so. As demonstrated by the > gem_busy@basic-default-hang. > > I suggest we decouple the waiters completely from hangcheck: > > - const bool busy = intel_engine_has_waiter(engine); > + const bool busy = engine->timeline->inflight_seqnos; inflight seqnos isn't a good choice either, as that doesn't mean the engine is active yet. The only issue with this patch was resetting the hangcheck.seqno nerfed the waiterless hangcheck. Turned out to be a very bad idea. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx