[PATCH 2/4] drm/i915: Only slightly increment hangcheck score if we succesfully kick a ring

daniel at ffwll.ch (Daniel Vetter) · Tue, 11 Jun 2013 16:37:26 +0200

On Tue, Jun 11, 2013 at 4:16 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Tue, Jun 11, 2013 at 04:05:41PM +0200, Daniel Vetter wrote:
>> On Tue, Jun 11, 2013 at 02:40:19PM +0100, Chris Wilson wrote:
>> > Not sure what you mean here. The check is fairly easy and has gotten us
>> > out of many a hole before, and makes for a good defense. So how would
>> > you want to fine tune it?
>>
>> Something like the MI_WAIT hangcheck score, but like I've said as long as
>> we don't have a real-world bug report (some poor guy disabled semaphores
>> maybe due to the snb issue?) not worth bothering at all.
>>
>> I've just thought that if we're unlucky and miss the interrupt a few times
>> in a row we don't want to accidentally declare the gpu dead.
>
> I regarded it as a driver bug, that a GPU reset would not help. So the
> choice is between limping along with the hopefully occasional stall, or
> terminating the GPU with extreme prejudice. I chose the former, hence
> did not increment the hangcheck.

Hm, maybe I'm reading the logic wrongly, but don't we add a += HUNG
score now for a stuck, but idle ring? So pretty short of declaring the
thing dead? Ofc there's the slow decline if the gpu isn't actually
dead, but if we have more than 1 such stall every HUNG (=20) hangcheck
times we'll eventually declare it dead despite the limping along.

Anyway nothing to really worry about, just wanted to check my
understanding here.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch