On Tue, Apr 18, 2017 at 01:23:20PM -0700, Michel Thierry wrote: > From: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> > > This change implements support for per-engine reset as an initial, less > intrusive hang recovery option to be attempted before falling back to the > legacy full GPU reset recovery mode if necessary. This is only supported > from Gen8 onwards. > > Hangchecker determines which engines are hung and invokes error handler to > recover from it. Error handler schedules recovery for each of those engines > that are hung. The recovery procedure is as follows, > - identifies the request that caused the hang and it is dropped > - force engine to idle: this is done by issuing a reset request > - reset and re-init engine > - restart submissions to the engine > > If engine reset fails then we fall back to heavy weight full gpu reset > which resets all engines and reinitiazes complete state of HW and SW. > > v2: Rebase. > v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before > calling i915_gem_reset_engine (Chris). > v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and > reuse the function for reset_engine. > v5: intel_reset_engine_start/cancel instead of request/unrequest_reset. > v6: Clean up reset_engine function to not require mutex, i.e. no need to call > revoke/restore_fences and _retire_requests (Chris). > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx> > Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> > Signed-off-by: Michel Thierry <michel.thierry@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.c | 76 ++++++++++++++++++++++++-- > drivers/gpu/drm/i915/i915_drv.h | 12 +++- > drivers/gpu/drm/i915/i915_gem.c | 97 +++++++++++++++++++-------------- > drivers/gpu/drm/i915/i915_gem_request.c | 2 +- > drivers/gpu/drm/i915/intel_uncore.c | 20 +++++++ > 5 files changed, 158 insertions(+), 49 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index e03d0643dbd6..634893cd93b3 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -1810,7 +1810,7 @@ void i915_reset(struct drm_i915_private *dev_priv) > > pr_notice("drm/i915: Resetting chip after gpu hang\n"); > disable_irq(dev_priv->drm.irq); > - ret = i915_gem_reset_prepare(dev_priv); > + ret = i915_gem_reset_prepare(dev_priv, ALL_ENGINES); > if (ret) { > DRM_ERROR("GPU recovery failed\n"); > intel_gpu_reset(dev_priv, ALL_ENGINES); > @@ -1852,7 +1852,7 @@ void i915_reset(struct drm_i915_private *dev_priv) > i915_queue_hangcheck(dev_priv); > > finish: > - i915_gem_reset_finish(dev_priv); > + i915_gem_reset_finish(dev_priv, ALL_ENGINES); > enable_irq(dev_priv->drm.irq); > > wakeup: > @@ -1871,11 +1871,79 @@ void i915_reset(struct drm_i915_private *dev_priv) > * > * Reset a specific GPU engine. Useful if a hang is detected. > * Returns zero on successful reset or otherwise an error code. > + * > + * Procedure is: > + * - identifies the request that caused the hang and it is dropped > + * - force engine to idle: this is done by issuing a reset request > + * - reset engine > + * - restart submissions to the engine > */ > int i915_reset_engine(struct intel_engine_cs *engine) > { > - /* FIXME: replace me with engine reset sequence */ > - return -ENODEV; > + int ret; > + struct drm_i915_private *dev_priv = engine->i915; > + struct i915_gpu_error *error = &dev_priv->gpu_error; > + > + GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags)); > + > + DRM_DEBUG_DRIVER("resetting %s\n", engine->name); > + > + /* > + * We need to first idle the engine by issuing a reset request, > + * then perform soft reset and re-initialize hw state, for all of > + * this GT power need to be awake so ensure it does throughout the > + * process > + */ > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); Hmm, what path did we take to get here without taking rpm? I thought I had fixed the last offender. > + disable_irq(dev_priv->drm.irq); I am 99% certain that we don't need to disable_irq() now for per-engine reset... I'd keep it in the global reset as simple paranoia. > + ret = i915_gem_reset_prepare_engine(engine); > + if (ret) { > + DRM_ERROR("Previous reset failed - promote to full reset\n"); > + goto error; > + } > + > + /* > + * the request that caused the hang is stuck on elsp, identify the > + * active request and drop it, adjust head to skip the offending > + * request to resume executing remaining requests in the queue. > + */ Hmm. Interesting. This relies on i915_gem_retire_requests() (i.e. struct_mutex) to skip replaying innocent requests, but here we should be asserting that we do have the hung request. i.e. request = i915_gem_find_active_request(engine); if (!request) goto skip. Bonus points for tying that into i915_gem_reset_prepare_engine() so that we only seach for the active_request once. > + i915_gem_reset_engine(engine); Where does skip_request() get called? > + > + /* forcing engine to idle */ > + ret = intel_reset_engine_start(engine); > + if (ret) { > + DRM_ERROR("Failed to disable %s\n", engine->name); > + goto error; > + } > + > + /* finally, reset engine */ > + ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine)); > + if (ret) { > + DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret); > + intel_reset_engine_cancel(engine); > + goto error; > + } > + > + /* be sure the request reset bit gets cleared */ > + intel_reset_engine_cancel(engine); > + > + i915_gem_reset_finish_engine(engine); > + > + /* replay remaining requests in the queue */ > + ret = engine->init_hw(engine); > + if (ret) > + goto error; > + > +wakeup: > + enable_irq(dev_priv->drm.irq); > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); > + wake_up_bit(&error->flags, I915_RESET_HANDOFF); No handoff here anymore. > + return ret; > + > +error: > + /* use full gpu reset to recover on error */ > + goto wakeup; > } -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx