Re: [PATCH 2/2] drm/i915: Reset hangcheck timeouts upon idling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 19, 2017 at 05:09:46PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > If we have a long period of idleness, we turn off the hangcheck timer
> > and stop polling the hardware. Before we restart the hangcheck, we
> > should clear the previous timestamps to prevent us thinking that the
> > engine was stalled for a long time, if the seqno were manipulated
> > carefully (such as the repeating patterns in gem_exec_whisper).
> >
> > It should have no impact upon normal use.
> >
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/intel_hangcheck.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> > index b0ca0c4c70d9..a74decca5109 100644
> > --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> > @@ -409,13 +409,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >  	int busy_count = 0;
> >  
> >  	if (!i915.enable_hangcheck)
> > -		return;
> > +		goto disarm_hangcheck;
> >  
> >  	if (!READ_ONCE(dev_priv->gt.awake))
> > -		return;
> > +		goto disarm_hangcheck;
> >  
> >  	if (i915_terminally_wedged(&dev_priv->gpu_error))
> > -		return;
> > +		goto disarm_hangcheck;
> >  
> >  	/* As enabling the GPU requires fairly extensive mmio access,
> >  	 * periodically arm the mmio checker to see if we are triggering
> > @@ -446,8 +446,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >  		hangcheck_declare_hang(dev_priv, hung, stuck);
> >  
> >  	/* Reset timer in case GPU hangs without another request being added */
> > -	if (busy_count)
> > +	if (busy_count) {
> >  		i915_queue_hangcheck(dev_priv);
> 
> Now if we don't have a waiter, we always init hangcheck. And thus
> we never detect a hang if so. As demonstrated by the
> gem_busy@basic-default-hang.
> 
> I suggest we decouple the waiters completely from hangcheck:
> 
> -               const bool busy = intel_engine_has_waiter(engine);
> +               const bool busy = engine->timeline->inflight_seqnos;

inflight seqnos isn't a good choice either, as that doesn't mean the
engine is active yet. The only issue with this patch was resetting the
hangcheck.seqno nerfed the waiterless hangcheck. Turned out to be a very
bad idea.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux