Re: [PATCH] drm/i915/execlists: Reset ring registers on rebinding contexts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:

> Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> tests, he wedged the driver by injecting an unrecoverable hang whilst a
> batch was spinning. As we reset the gpu in the middle of the spinner,
> when resumed it would continue on from the next instruction in the ring
> and write it's breadcrumb. However, on wedging we updated our
> bookkeeping to indicate that the GPU had completed executing and would
> restart from after the breadcrumb; so the emission of the stale
> breadcrumb from before the reset came as a bit of a surprise.
>

Ok trying to make sense of the above and how the wedging works.
Here is my assertions.

The spinning batch was never found to be guilty of anything.
On wedge we fast forwarded all engine seqnos to be what
was last submitted.
We did hw reset.
On context image, the RING_HEAD was pointing to bb start
of spin batch (or the instruction after it)
On resubmitting the context, we saw a seqno write from pre
reset era.

So this doesn't affect only spinning batches but any busy
batch that was running while we wedged?

-Mika

> A simple fix is to when rebinding the context into the GPU, we update
> the ring register state in the context image to match our bookkeeping.
> We already have to update the RING_START and RING_TAIL, so updating
> RING_HEAD as well is trivial. This works because whenever we unbind the
> context, we keep the bookkeeping in check; and on wedging we unbind all
> contexts.
>
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ba7f7831f934..654634254b64 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
>  	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>  	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
>  		i915_ggtt_offset(ce->ring->vma);
> +	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
>  
>  	ce->state->obj->pin_global++;
>  	i915_gem_context_get(ctx);
> -- 
> 2.16.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux