Re: [PATCH] drm/i915/lrc: Scrub the GPU state of the guilty hanging request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/27/2018 12:32 PM, Chris Wilson wrote:
Previously, we just reset the ring register in the context image such
that we could skip over the broken batch and emit the closing
breadcrumb. However, on resume the context image and GPU state would be
reloaded, which may have been left in an inconsistent state by the
reset. The presumption was that at worst it would just cause another
reset and skip again until it recovered, however it seems just as likely
to cause an unrecoverable hang. Instead of risking loading an incomplete
context image, restore it back to the default state.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
Cc: Michel Thierry <michel.thierry@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
  drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++-------
  1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ce23d5116482..422b05290ed6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1804,8 +1804,8 @@ static void reset_common_ring(struct intel_engine_cs *engine,
  			      struct i915_request *request)
  {
  	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct intel_context *ce;
  	unsigned long flags;
+	u32 *regs;
GEM_TRACE("%s request global=%x, current=%d\n",
  		  engine->name, request ? request->global_seqno : 0,
@@ -1855,14 +1855,24 @@ static void reset_common_ring(struct intel_engine_cs *engine,
  	 * future request will be after userspace has had the opportunity
  	 * to recreate its own state.
  	 */
-	ce = &request->ctx->engine[engine->id];
-	execlists_init_reg_state(ce->lrc_reg_state,
-				 request->ctx, engine, ce->ring);
+	regs = request->ctx->engine[engine->id].lrc_reg_state;
+	if (engine->default_state) {
+		void *defaults;
+
+		defaults = i915_gem_object_pin_map(engine->default_state,
+						   I915_MAP_WB);
+		if (!IS_ERR(defaults)) {
+			memcpy(regs,
+			       defaults + LRC_HEADER_PAGES * PAGE_SIZE,
+			       engine->context_size);
Hi,

The context_size is taking into count the PP_HWSP page, do we also need to rewrite the PP_HSWP? (or just the logical state).

Also regs is already pointing to the start of the logical state
(vaddr + LRC_STATE_PN * PAGE_SIZE).

So if we want to overwrite from the PP_HWSP, then regs is not the right offset, or if we only want to change the logical state then it should be from 'defaults + LRC_STATE_PN * PAGE_SIZE'.

-Michel

+			i915_gem_object_unpin_map(engine->default_state);
+		}
+	}
+	execlists_init_reg_state(regs, request->ctx, engine, request->ring);
/* Move the RING_HEAD onto the breadcrumb, past the hanging batch */
-	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
-		i915_ggtt_offset(ce->ring->vma);
-	ce->lrc_reg_state[CTX_RING_HEAD+1] = request->postfix;
+	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(request->ring->vma);
+	regs[CTX_RING_HEAD + 1] = request->postfix;
request->ring->head = request->postfix;
  	intel_ring_update_space(request->ring);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux