Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking

Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> · Mon, 27 Mar 2017 13:44:00 +0300

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:

> If the request->wa_tail is 0 (because it landed exactly on the end of
> the ringbuffer), when we reconstruct request->tail following a reset we
> fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> never able to catch up with RING_TAIL and the GPU spins endlessly. If
> the ring contains a couple of breadcrumbs, even our hangcheck is unable
> to catch the busy-looping as the ACTHD and seqno continually advance.

Tail is past ring size (on hw) and the ring contents has seqno writes.
So we will replay the ring contents over and over and seqno advances
and wraps back to the first breadcrumbs in ring?

> Fixes: a3aabe86a340 ("drm/i915/execlists: Reinitialise context image after GPU hang")
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # v4.10+
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3fdabba0a32d..c8dd848d2ebe 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1302,6 +1302,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>  
>  	/* Reset WaIdleLiteRestore:bdw,skl as well */
>  	request->tail = request->wa_tail - WA_TAIL_DWORDS * sizeof(u32);
> +	request->tail &= request->ring->size - 1;
>  	GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
>  }
>  
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx