On Wed, Mar 15, 2017 at 09:23:01AM +0000, Tvrtko Ursulin wrote: > > On 14/03/2017 09:34, Chris Wilson wrote: > >When we wedge the device, we override engine->submit_request with a nop > >to ensure that all in-flight requests are marked in error. However, igt > >would like to unwedge the device to test -EIO handling. This requires us > >to flush those in-flight requests and restore the original > >engine->submit_request. > > > >v2: Use a vfunc to unify enabling request submission to engines > >v3: Split new vfunc to a separate patch. > > > >Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") > >Testcase: igt/gem_eio > >Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > >--- > > drivers/gpu/drm/i915/i915_drv.c | 2 +- > > drivers/gpu/drm/i915/i915_drv.h | 1 + > > drivers/gpu/drm/i915/i915_gem.c | 51 +++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 53 insertions(+), 1 deletion(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > >index e312b61ba6bb..11d1066b673c 100644 > >--- a/drivers/gpu/drm/i915/i915_drv.c > >+++ b/drivers/gpu/drm/i915/i915_drv.c > >@@ -1822,7 +1822,7 @@ void i915_reset(struct drm_i915_private *dev_priv) > > return; > > > > /* Clear any previous failed attempts at recovery. Time to try again. */ > >- __clear_bit(I915_WEDGED, &error->flags); > >+ i915_gem_unset_wedged(dev_priv); > > error->reset_count++; > > > > pr_notice("drm/i915: Resetting chip after gpu hang\n"); > >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >index 48ff64812289..53a791d8d992 100644 > >--- a/drivers/gpu/drm/i915/i915_drv.h > >+++ b/drivers/gpu/drm/i915/i915_drv.h > >@@ -3406,6 +3406,7 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv); > > void i915_gem_reset(struct drm_i915_private *dev_priv); > > void i915_gem_reset_finish(struct drm_i915_private *dev_priv); > > void i915_gem_set_wedged(struct drm_i915_private *dev_priv); > >+void i915_gem_unset_wedged(struct drm_i915_private *dev_priv); > > > > void i915_gem_init_mmio(struct drm_i915_private *i915); > > int __must_check i915_gem_init(struct drm_i915_private *dev_priv); > >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >index 202bb850f260..e06830916a05 100644 > >--- a/drivers/gpu/drm/i915/i915_gem.c > >+++ b/drivers/gpu/drm/i915/i915_gem.c > >@@ -3000,6 +3000,57 @@ void i915_gem_set_wedged(struct drm_i915_private *dev_priv) > > mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0); > > } > > > >+void i915_gem_unset_wedged(struct drm_i915_private *i915) > >+{ > >+ struct i915_gem_timeline *tl; > >+ int i; > >+ > >+ lockdep_assert_held(&i915->drm.struct_mutex); > >+ if (!test_bit(I915_WEDGED, &i915->gpu_error.flags)) > >+ return; > >+ > >+ /* Before unwedging, make sure that all pending operations > >+ * are flushed and errored out. No more can be submitted until > >+ * we reset the wedged bit. > >+ */ > >+ list_for_each_entry(tl, &i915->gt.timelines, link) { > >+ for (i = 0; i < ARRAY_SIZE(tl->engine); i++) { > >+ struct drm_i915_gem_request *rq; > >+ > >+ rq = i915_gem_active_peek(&tl->engine[i].last_request, > >+ &i915->drm.struct_mutex); > >+ if (!rq) > >+ continue; > >+ > >+ /* We can't use our normal waiter as we want to > >+ * avoid recursively trying to handle the current > >+ * reset. The basic dma_fence_default_wait() installs > >+ * a callback for dma_fence_signal(), which is > >+ * triggered by our nop handler (indirectly, the > >+ * callback enables the signaler thread which is > >+ * woken by the nop_submit_request() advancing the seqno > >+ * and when the seqno passes the fence, the signaler > >+ * then signals the fence waking us up). > >+ */ > >+ dma_fence_default_wait(&rq->fence, false, > >+ MAX_SCHEDULE_TIMEOUT); > >+ } > >+ } > >+ > >+ /* Undo nop_submit_request. We prevent all new i915 requests from > >+ * being queued (by disallowing execbuf whilst wedged) so having > >+ * waited for all active requests above, we know the system is idle > >+ * and do not have to worry about a thread being inside > >+ * engine->submit_request() as we swap over. So unlike installing > >+ * the nop_submit_request on reset, we can do this from normal > >+ * context and do not require stop_machine(). > >+ */ > >+ intel_engines_enable_submission(i915); > > So the point of the dma_fence_default_wait above is it to ensure all > nop_submit_request call backs have completed? I don't at the moment > understand how could there be such callbacks since unwedge is > happening after the wedge. So the wedge already installed the nop > handler, and by the time we get to another reset attempt, isn't it > already guaranteed all of those have exited? There's no such guarantee. The nop_submit_request() is to ensure that all third party driven requests are flushed and our requests are marked with dma_fence_set_error(-EIO) and not submitted to hw. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx