Re: [PATCH 05/11] drm/i915/tdr: Identify hung request and drop it

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 27 Jul 2016 13:27:01 +0100



On Wed, Jul 27, 2016 at 12:54:44PM +0100, Arun Siluvery wrote:
> On 26/07/2016 22:37, Chris Wilson wrote:
> >On Tue, Jul 26, 2016 at 05:40:51PM +0100, Arun Siluvery wrote:
> >>The current active request is the one that caused the hang so this is
> >>retrieved and removed from elsp queue, otherwise we cannot submit other
> >>workloads to be processed by GPU.
> >>
> >>A consistency check between HW and driver is performed to ensure that we
> >>are dropping the correct request. Since this request doesn't get executed
> >>anymore, we also need to advance the seqno to mark it as complete. Head
> >>pointer is advanced to skip the offending batch so that HW resumes
> >>execution other workloads. If HW and SW don't agree then we won't proceed
> >>with engine reset, this is treated as an error condition and we fallback to
> >>full gpu reset.
> >>
> >>Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >>Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
> >>Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx>
> >>---
> >>  drivers/gpu/drm/i915/intel_lrc.c | 116 +++++++++++++++++++++++++++++++++++++++
> >>  drivers/gpu/drm/i915/intel_lrc.h |   2 +
> >>  2 files changed, 118 insertions(+)
> >>
> >>diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >>index daf1279..8fc5a3b 100644
> >>--- a/drivers/gpu/drm/i915/intel_lrc.c
> >>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>@@ -1026,6 +1026,122 @@ void intel_lr_context_unpin(struct i915_gem_context *ctx,
> >>  	i915_gem_context_put(ctx);
> >>  }
> >>
> >>+static void intel_lr_context_resync(struct i915_gem_context *ctx,
> >>+				    struct intel_engine_cs *engine)
> >>+{
> >>+	u32 head;
> >>+	u32 head_addr, tail_addr;
> >>+	u32 *reg_state;
> >>+	struct intel_ringbuffer *ringbuf;
> >>+	struct drm_i915_private *dev_priv = engine->i915;
> >>+
> >>+	ringbuf = ctx->engine[engine->id].ringbuf;
> >>+	reg_state = ctx->engine[engine->id].lrc_reg_state;
> >>+
> >>+	head = I915_READ_HEAD(engine);
> >>+	head_addr = head & HEAD_ADDR;
> >>+	tail_addr = reg_state[CTX_RING_TAIL+1] & TAIL_ADDR;
> >
> >?
> >
> >We know where we want the head to be to emit the breadcrumb and complete
> >the request since we can record that when constructing the request. That
> >also neatly solves the riddle of how to update the hw state.
> 
> We want to skip only MI_BATCH_BUFFER_START and continue as usual so
> just using existing info.

That's exactly my point and why this approach is overkill since we
aleady know where we need to resume from.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx