On Sat, Feb 04, 2017 at 07:37:13PM +0000, Chris Wilson wrote: > Following a reset, the context and page directory registers are lost. > However, the queue of requests that we resubmit after the reset may > depend upon them - the registers are restored from a context image, but > that restore may be inhibited and may simply be absent from the request > if it was in the middle of a sequence using the same context. If we > prime the CCID/PD registers with the first request in the queue (even > for the hung request), we prevent invalid memory access for the > following requests (and continually hung engines). > > Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > --- > > This could do with going to stable but requires a few odds and ends, such > as dma_fence_set_error(). Oh well, fortunately it is not as bad it might > seem since these registers are restored from the context - but that then > requires a mesa context to reset the GPU state (as fortunately we called > MI_SET_CONTEXT at the start of every batch!), but any other request in > the meantime will likely hang again. > > (Also I left gen8/ringbuffer reset_hw as an exercise for the reader) I'm also puzzled as to how this escaped igt, the fence test should have tried to write through the aliasing ppgtt without a context restore (i.e. into randomness) following the hang. Weird. On the positive side, it may mean the impact isn't as large as I think it should be. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx