On 16/02/15 16:12, Thomas Daniel wrote: > Work was getting left behind in LRC contexts during reset. This causes a hang > if the GPU is reset when HEAD==TAIL because the context's ringbuffer head and > tail don't get reset and retiring a request doesn't alter them, so the ring > still appears full. > > Added a function intel_lr_context_reset() to reset head and tail on a LRC and > its ringbuffer. > > Call intel_lr_context_reset() for each context in i915_gem_context_reset() when > in execlists mode. > > Testcase: igt/pm_rps --run-subtest reset #bdw > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096 > Signed-off-by: Thomas Daniel <thomas.daniel@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem_context.c | 12 +++++++---- > drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/intel_lrc.h | 2 ++ > 3 files changed, 44 insertions(+), 4 deletions(-) A couple of minor points below, but not ones that require changes, so: Reviewed-by: Dave Gordon <david.s.gordon@xxxxxxxxx> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 8603bf4..70346b0 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -296,11 +296,15 @@ void i915_gem_context_reset(struct drm_device *dev) > struct drm_i915_private *dev_priv = dev->dev_private; > int i; > > - /* In execlists mode we will unreference the context when the execlist > - * queue is cleared and the requests destroyed. > - */ > - if (i915.enable_execlists) > + if (i915.enable_execlists) { > + struct intel_context *ctx; > + > + list_for_each_entry(ctx, &dev_priv->context_list, link) { > + intel_lr_context_reset(dev, ctx); > + } > + > return; > + } > > for (i = 0; i < I915_NUM_RINGS; i++) { > struct intel_engine_cs *ring = &dev_priv->ring[i]; > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index aafcef3..1946bb9 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1950,3 +1950,37 @@ error_unpin_ctx: > drm_gem_object_unreference(&ctx_obj->base); > return ret; > } > + > +void intel_lr_context_reset(struct drm_device *dev, > + struct intel_context *ctx) > +{ > + struct drm_i915_private *dev_priv = dev->dev_private; > + struct intel_engine_cs *ring; > + int i; > + > + for_each_ring(ring, dev_priv, i) { > + struct drm_i915_gem_object *ctx_obj = > + ctx->engine[ring->id].state; > + if (ctx_obj) { > + struct intel_ringbuffer *ringbuf = > + ctx->engine[ring->id].ringbuf; > + uint32_t *reg_state; > + struct page *page; > + > + if (i915_gem_object_get_pages(ctx_obj)) { > + WARN(1, "Failed get_pages for context obj\n"); > + continue; > + } This could be folded into a single "if (WARN_ON(...)) continue;" > + page = i915_gem_object_get_page(ctx_obj, 1); Isn't it a pity that we have i915_gem_object_get_page() and i915_gem_object_get_pages() which look so similar but do completely different things :( > + reg_state = kmap_atomic(page); > + > + reg_state[CTX_RING_HEAD+1] = 0; > + reg_state[CTX_RING_TAIL+1] = 0; > + > + kunmap_atomic(reg_state); > + > + ringbuf->head = 0; > + ringbuf->tail = 0; > + } > + } > +} > diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h > index f635735..5dd0eca 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.h > +++ b/drivers/gpu/drm/i915/intel_lrc.h > @@ -73,6 +73,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx, > struct intel_engine_cs *ring); > void intel_lr_context_unpin(struct intel_engine_cs *ring, > struct intel_context *ctx); > +void intel_lr_context_reset(struct drm_device *dev, > + struct intel_context *ctx); > > /* Execlists */ > int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists); > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx