Re: [PATCH 05/11] drm/i915/tdr: Identify hung request and drop it

Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> · Wed, 27 Jul 2016 12:54:44 +0100

On 26/07/2016 22:37, Chris Wilson wrote:
On Tue, Jul 26, 2016 at 05:40:51PM +0100, Arun Siluvery wrote:
The current active request is the one that caused the hang so this is
retrieved and removed from elsp queue, otherwise we cannot submit other
workloads to be processed by GPU.

A consistency check between HW and driver is performed to ensure that we
are dropping the correct request. Since this request doesn't get executed
anymore, we also need to advance the seqno to mark it as complete. Head
pointer is advanced to skip the offending batch so that HW resumes
execution other workloads. If HW and SW don't agree then we won't proceed
with engine reset, this is treated as an error condition and we fallback to
full gpu reset.

Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/intel_lrc.c | 116 +++++++++++++++++++++++++++++++++++++++
  drivers/gpu/drm/i915/intel_lrc.h |   2 +
  2 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index daf1279..8fc5a3b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1026,6 +1026,122 @@ void intel_lr_context_unpin(struct i915_gem_context *ctx,
  	i915_gem_context_put(ctx);
  }

+static void intel_lr_context_resync(struct i915_gem_context *ctx,
+				    struct intel_engine_cs *engine)
+{
+	u32 head;
+	u32 head_addr, tail_addr;
+	u32 *reg_state;
+	struct intel_ringbuffer *ringbuf;
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	ringbuf = ctx->engine[engine->id].ringbuf;
+	reg_state = ctx->engine[engine->id].lrc_reg_state;
+
+	head = I915_READ_HEAD(engine);
+	head_addr = head & HEAD_ADDR;
+	tail_addr = reg_state[CTX_RING_TAIL+1] & TAIL_ADDR;

?

We know where we want the head to be to emit the breadcrumb and complete
the request since we can record that when constructing the request. That
also neatly solves the riddle of how to update the hw state.

We want to skip only MI_BATCH_BUFFER_START and continue as usual so just 
using existing info.

resync?  intel_lr_context_reset_ring may be more apt, or maybe
intel_execlists_reset_request?

resync because we read current state and update it. 
intel_execlists_reset_request() sounds better, will change it as 
suggested. thanks.

regards
Arun


-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx