[PATCH 0/4] [RFC] use HW watchdog timer

ben at bwidawsk.net (Ben Widawsky) · Tue, 17 Jul 2012 11:51:18 -0700

On Tue, 17 Jul 2012 12:12:39 +0100
Chris Wilson <chris at chris-wilson.co.uk> wrote:

> On Mon, 16 Jul 2012 11:51:55 -0700, Ben Widawsky <ben at bwidawsk.net> wrote:
> > Pros:
> > * Potential for per batch, or ring watchdog values. I believe when/if we
> > get to GPGPU workloads, this is particularly interesting.
> > * Batch granularity hang detection. This mostly just makes hang
> > detection and recovery a bit easier IMO.
> > 
> > Cons:
> > * Blit ring doesn't have an interrupt. This means we still need the
> > software watchdog, and it makes hang detection more complex. I've been
> > led to believe future HW *may* have this interrupt.
> > * Semaphores 
> 
> Replacing the black magic for INSTDONE hang detection does seem like a
> sensible plan, but as long as we require the hangcheck timer we are only
> adding code complexity. So there really needs to a be a compelling
> advantage for the watchdoy, something that we cannot acheive with the
> existing method.

Just to be clear, INSTDONE can go away. I don't think it's valuable for
the blitter.

> 
> For me, the criteria is whether we ever miss a hang or falsely accuse
> the hw of stopping. If I understand the watchdog correctly, it basically
> ensures the batch completes within a certain interval which we can
> codify into the existing hangcheck, so no USP.

Yeah. If we follow the windows model, I think we just tweak the value
until we find something, "good" and just always reset on the timeout
instead of doing instdone-foo.

> 
> Or is there more magic waiting in the wings?
> -Chris
> 

The magic was only a more straightforward way of finding the batch to
blame, and as I said on IRC, when I started I was planning to gut the
whole SW watchdog; that was the magic.

FWIW I think we may see the interrupt in future products; so it may
still be worth considering whether we want to move in this direction.

-- 
Ben Widawsky, Intel Open Source Technology Center