On Tue, Feb 26, 2013 at 05:09:03PM +0200, Mika Kuoppala wrote: > Chris Wilson <chris at chris-wilson.co.uk> writes: > > > On Tue, Feb 26, 2013 at 01:05:07PM +0200, Mika Kuoppala wrote: > >> Instead of relying in acthd, track ring seqno progression > >> to detect if ring has hung. > > > > This needs a comment that it has a user visible side-effect of limiting > > batches to a maximum of 1.5s runtime. Before, that limit was softer in > > that we had a chance to spot that the GPU was busy - even if we could be > > fooled by an infinite loop. > > As the current code has: > > > if (dev_priv->gpu_error.hangcheck_count++ > 1) > > to trigger the hang, so it will trigger at 3rd timer call (4.5seconds) > > With my patch it will be 4.5 seconds if rings are waiting > and 3 seconds if there is non waiting ring involved. > > As i can't explain what is the added benefit to declare hang earlier > if there is a non waiting ring, do you want me to simplify this > to just declare hang if there is no progress in 4.5 seconds in both cases? > To match the old trigger timing. Yes, that would be good if you can simplify the logic so that it does not react differently for no apparent reason. I would prefer the 3s timeout. > > Did you write an i-g-t for detecting a ring of chained batchbuffers? > > -Chris > > Yes, you can find it in here: > > https://github.com/mkuoppal/intel-gpu-tools/commit/1df7d49ff9ecedf9c55933a9e36b1eb41f07abc6 I definitely add an exercise for an infinite loop that was not caught by the current hangcheck, and add a fixes: note to this changelog. (Couldn't spot that explicit case in the reset-status test, and it can be added independently of this series as an demonstration of a known breakage.) -Chris -- Chris Wilson, Intel Open Source Technology Centre