On Mon, Mar 27, 2017 at 02:07:09PM +0300, Mika Kuoppala wrote: > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > On Mon, Mar 27, 2017 at 01:44:00PM +0300, Mika Kuoppala wrote: > >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > >> > >> > If the request->wa_tail is 0 (because it landed exactly on the end of > >> > the ringbuffer), when we reconstruct request->tail following a reset we > >> > fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is > >> > never able to catch up with RING_TAIL and the GPU spins endlessly. If > >> > the ring contains a couple of breadcrumbs, even our hangcheck is unable > >> > to catch the busy-looping as the ACTHD and seqno continually advance. > >> > >> Tail is past ring size (on hw) and the ring contents has seqno writes. > >> So we will replay the ring contents over and over and seqno advances > >> and wraps back to the first breadcrumbs in ring? > > > > Yup. It was most confusing to watch. The execlist_port[] was static, > > RING_START was static, yet the seqno kept changing. I felt like I was > > hallucinating. That or insomnia. > > /o\ > > When we reset_common_ring() it is always after a hw reset. So the > 'last' in sense of hardware's lrc contexts doesn't mean much. > > So can we actually get rid of the tail trickery as for first > request after reset, as the lite restore can't happen and > should not matter? So move handling the rare case into the latency sensitive hotpath? ;) The complaint I feel is that we don't have a great interface, otoh this manipulation is currently a one-off. -Chris -- Chris Wilson, Intel Open Source Technology Centre