On Tue, Jun 11, 2013 at 04:37:26PM +0200, Daniel Vetter wrote: > On Tue, Jun 11, 2013 at 4:16 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote: > > On Tue, Jun 11, 2013 at 04:05:41PM +0200, Daniel Vetter wrote: > >> On Tue, Jun 11, 2013 at 02:40:19PM +0100, Chris Wilson wrote: > >> > Not sure what you mean here. The check is fairly easy and has gotten us > >> > out of many a hole before, and makes for a good defense. So how would > >> > you want to fine tune it? > >> > >> Something like the MI_WAIT hangcheck score, but like I've said as long as > >> we don't have a real-world bug report (some poor guy disabled semaphores > >> maybe due to the snb issue?) not worth bothering at all. > >> > >> I've just thought that if we're unlucky and miss the interrupt a few times > >> in a row we don't want to accidentally declare the gpu dead. > > > > I regarded it as a driver bug, that a GPU reset would not help. So the > > choice is between limping along with the hopefully occasional stall, or > > terminating the GPU with extreme prejudice. I chose the former, hence > > did not increment the hangcheck. > > Hm, maybe I'm reading the logic wrongly, but don't we add a += HUNG > score now for a stuck, but idle ring? So pretty short of declaring the > thing dead? Yeah... Didn't mean to do that, as all the time I was thinking "don't hang here, this is our bug not userspace's". > Ofc there's the slow decline if the gpu isn't actually > dead, but if we have more than 1 such stall every HUNG (=20) hangcheck > times we'll eventually declare it dead despite the limping along. > > Anyway nothing to really worry about, just wanted to check my > understanding here. Looks like my fingers mutinied; and I am the one confused. -Chris -- Chris Wilson, Intel Open Source Technology Centre