On Wed, 11 Apr 2012 09:18:15 +0100 Chris Wilson <chris at chris-wilson.co.uk> wrote: > On Tue, 10 Apr 2012 16:59:11 -0700, Ben Widawsky <ben at bwidawsk.net> wrote: > > On Tue, 10 Apr 2012 17:00:41 +0100 > > Chris Wilson <chris at chris-wilson.co.uk> wrote: > > > > > On the first instance we just wish to kick the waiters and see if that > > > terminates the wait conditions. If it does not, then we do not want to > > > keep retrying without ever making any forward progress and becoming > > > stuck in a hangcheck loop. > > > > > > Reported-and-tested-by: Lukas Hejtmanek <xhejtman at fi.muni.cz> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48209 > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> > > > > I'm still confused about the problem we are purportedly fixing. > > > > This should happen if we've missed an irq (or the watchdog fired too > > soon), and then fires again before the thread has actually woken up to > > realize that is missed the first IRQ? > > > > As for extract the kick_ring bit of code for core hangcheck_elapsed, > > that looks fine. I just don't quite understand the exact problem this > > solves, and can't envision how we hit this case it seems the patch will > > fix. > > Sure, just look at the bug report for the garbage we wrote into the > ringbuffers and how we ended up indefinite wait. This is not defense > against normal behaviour but the driver screwing up. > -Chris > In that case this is Reviewed-by: Ben Widawsky <ben at bwidawsk.net> Though I am still pretty surprised that we have even seen this :|