Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Wrap tail pointer after reset tweaking

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 27 Mar 2017 12:19:45 +0100

On Mon, Mar 27, 2017 at 02:07:09PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote:
> >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> >> 
> >> > If the request->wa_tail is 0 (because it landed exactly on the end of
> >> > the ringbuffer), when we reconstruct request->tail following a reset we
> >> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
> >> > never able to catch up with RING_TAIL and the GPU spins endlessly. If
> >> > the ring contains a couple of breadcrumbs, even our hangcheck is unable
> >> > to catch the busy-looping as the ACTHD and seqno continually advance.
> >> 
> >> Tail is past ring size (on hw) and the ring contents has seqno writes.
> >> So we will replay the ring contents over and over and seqno advances
> >> and wraps back to the first breadcrumbs in ring?
> >
> > Yup. It was most confusing to watch. The execlist_port[] was static,
> > RING_START was static, yet the seqno kept changing. I felt like I was
> > hallucinating. That or insomnia.
> 
> /o\
> 
> When we reset_common_ring() it is always after a hw reset. So the
> 'last' in sense of hardware's lrc contexts doesn't mean much.
> 
> So can we actually get rid of the tail trickery as for first
> request after reset, as the lite restore can't happen and
> should not matter?

So move handling the rare case into the latency sensitive hotpath? ;)

The complaint I feel is that we don't have a great interface, otoh this
manipulation is currently a one-off.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre