[PATCH 04/13] drm/i915: track ring progression using seqnos

chris at chris-wilson.co.uk (Chris Wilson) · Tue, 26 Feb 2013 15:31:51 +0000

On Tue, Feb 26, 2013 at 05:09:03PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > On Tue, Feb 26, 2013 at 01:05:07PM +0200, Mika Kuoppala wrote:
> >> Instead of relying in acthd, track ring seqno progression
> >> to detect if ring has hung.
> >
> > This needs a comment that it has a user visible side-effect of limiting
> > batches to a maximum of 1.5s runtime. Before, that limit was softer in
> > that we had a chance to spot that the GPU was busy - even if we could be
> > fooled by an infinite loop.
> 
> As the current code has:
> 
> > if (dev_priv->gpu_error.hangcheck_count++ > 1)
> 
> to trigger the hang, so it will trigger at 3rd timer call (4.5seconds)
> 
> With my patch it will be 4.5 seconds if rings are waiting 
> and 3 seconds if there is non waiting ring involved.
> 
> As i can't explain what is the added benefit to declare hang earlier
> if there is a non waiting ring, do you want me to simplify this
> to just declare hang if there is no progress in 4.5 seconds in both cases?
> To match the old trigger timing.

Yes, that would be good if you can simplify the logic so that it does
not react differently for no apparent reason. I would prefer the 3s
timeout.

> > Did you write an i-g-t for detecting a ring of chained batchbuffers?
> > -Chris
> 
> Yes, you can find it in here:
> 
> https://github.com/mkuoppal/intel-gpu-tools/commit/1df7d49ff9ecedf9c55933a9e36b1eb41f07abc6

I definitely add an exercise for an infinite loop that was not caught by
the current hangcheck, and add a fixes: note to this changelog.
(Couldn't spot that explicit case in the reset-status test, and it can
be added independently of this series as an demonstration of a known
breakage.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre