[PATCH 02/43] drm/i915: don't bail out of intel_wait_ring_buffer too early

eugeni at dodonov.net (Eugeni Dodonov) · Wed, 14 Dec 2011 16:39:47 -0200

On Wed, Dec 14, 2011 at 10:56, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:

> In the pre-gem days with non-existing hangcheck and gpu reset code,
> this timeout of 3 seconds was pretty important to avoid stuck
> processes.
>
> But now we have the hangcheck code in gem that goes to great length
> to ensure that the gpu is really dead before declaring it wedged.
>
> So there's no need for this timeout anymore. Actually it's even harmful
> because we can bail out too early (e.g. with xscreensaver slip)
> when running giant batchbuffers. And our code isn't robust enough
> to properly unroll any state-changes, we pretty much rely on the gpu
> reset code cleaning up the mess (like cache tracking, fencing state,
> active list/request tracking, ...).
>
> With this change intel_begin_ring can only fail when the gpu is
> wedged, and it will return -EAGAIN (like wait_request in case the
> gpu reset is still outstanding).
>
> v2: Chris Wilson noted that on resume timers aren't running and hence
> we won't ever get kicked out of this loop by the hangcheck code. Use
> an insanely large timeout instead for the HAS_GEM case to prevent
> resume bugs from totally hanging the machine.
>
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> Acked-by: Ben Widawsky <ben at bwidawsk.net>
>

Reviewed-by: Eugeni Dodonov <eugeni.dodonov at intel.com>

-- 
Eugeni Dodonov
<http://eugeni.dodonov.net/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20111214/ad2b52f1/attachment.htm>