Re: [PATCH] drm/i915: Restore context and pd for ringbuffer submission after reset

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Sat, 4 Feb 2017 19:46:16 +0000



On Sat, Feb 04, 2017 at 07:37:13PM +0000, Chris Wilson wrote:
> Following a reset, the context and page directory registers are lost.
> However, the queue of requests that we resubmit after the reset may
> depend upon them - the registers are restored from a context image, but
> that restore may be inhibited and may simply be absent from the request
> if it was in the middle of a sequence using the same context. If we
> prime the CCID/PD registers with the first request in the queue (even
> for the hung request), we prevent invalid memory access for the
> following requests (and continually hung engines).
> 
> Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> ---
> 
> This could do with going to stable but requires a few odds and ends, such
> as dma_fence_set_error(). Oh well, fortunately it is not as bad it might
> seem since these registers are restored from the context - but that then
> requires a mesa context to reset the GPU state (as fortunately we called
> MI_SET_CONTEXT at the start of every batch!), but any other request in
> the meantime will likely hang again.
> 
> (Also I left gen8/ringbuffer reset_hw as an exercise for the reader)

I'm also puzzled as to how this escaped igt, the fence test should have
tried to write through the aliasing ppgtt without a context restore
(i.e. into randomness) following the hang. Weird. On the positive side,
it may mean the impact isn't as large as I think it should be.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx