Re: [PATCH] drm/i915: Always run hangcheck while the GPU is busy

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 31 Jan 2018 10:09:57 +0000

Quoting Mika Kuoppala (2018-01-31 09:41:35)
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > Previously, we relied on only running the hangcheck while somebody was
> > waiting on the GPU, in order to minimise the amount of time hangcheck
> > had to run. (If nobody was watching the GPU, nobody would notice if the
> > GPU wasn't responding -- eventually somebody would care and so kick
> > hangcheck into action.) However, this falls apart from around commit
> > 4680816be336 ("drm/i915: Wait first for submission, before waiting for
> > request completion"), as not all waiters declare themselves to hangcheck
> > and so we could switch off hangcheck and miss GPU hangs even when
> > waiting under the struct_mutex.
> >
> > If we enable hangcheck from the first request submission, and let it run
> > until the GPU is idle again, we forgo all the complexity involved with
> > only enabling around waiters. Instead we have to be careful that we do
> > not declare a GPU hang when idly waiting for the next request to be come
> > ready.
> >
> > Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion"
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>

Rewrote the last paragraph to try and make it clear what I was hinting
at, and so it doesn't sound like a description of what this patch is
doing but the background mechanics that this patch relies upon.

Thanks for the review and discussion,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx