On Mon, Sep 19, 2016 at 04:30:15PM +0100, Matthew Auld wrote: > From: "arun.siluvery@xxxxxxxxxxxxxxx" <arun.siluvery@xxxxxxxxxxxxxxx> > > This change implements support for per-engine reset as an initial, less > intrusive hang recovery option to be attempted before falling back to the > legacy full GPU reset recovery mode if necessary. This is only supported > from Gen8 onwards. > > Hangchecker determines which engines are hung and invokes error handler to > recover from it. Error handler schedules recovery for each of those engines > that are hung. The recovery procedure is as follows, > - identifies the request that caused the hang and it is dropped > - force engine to idle: this is done by issuing a reset request > - reset and re-init engine > - restart submissions to the engine > > If engine reset fails then we fall back to heavy weight full gpu reset > which resets all engines and reinitiazes complete state of HW and SW. > > v2 > - rebase > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx> > Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> > Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.c | 59 +++++++++++++++++++++++++++++++++---- > drivers/gpu/drm/i915/i915_drv.h | 3 ++ > drivers/gpu/drm/i915/i915_gem.c | 2 +- > drivers/gpu/drm/i915/intel_lrc.c | 10 +++++++ > drivers/gpu/drm/i915/intel_lrc.h | 1 + > drivers/gpu/drm/i915/intel_uncore.c | 41 +++++++++++++++++++++++--- > 6 files changed, 105 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 99fa690..8625207 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -1812,21 +1812,68 @@ error: > * Returns zero on successful reset or otherwise an error code. > * > * Procedure is fairly simple: > - * - force engine to idle > - * - save current state which includes head and current request > - * - reset engine > - * - restore saved state and resubmit context > + * - identifies the request that caused the hang and it is dropped > + * - force engine to idle: this is done by issuing a reset request > + * - reset engine > + * - restart submissions to the engine > */ > int i915_reset_engine(struct intel_engine_cs *engine) > { > int ret; > struct drm_i915_private *dev_priv = engine->i915; > > - /* FIXME: replace me with engine reset sequence */ > - ret = -ENODEV; > + /* > + * We need to first idle the engine by issuing a reset request, > + * then perform soft reset and re-initialize hw state, for all of > + * this GT power need to be awake so ensure it does throughout the > + * process > + */ > + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); > + > + /* > + * the request that caused the hang is stuck on elsp, identify the > + * active request and drop it, adjust head to skip the offending > + * request to resume executing remaining requests in the queue. > + */ > + i915_gem_reset_engine(engine); > + > + ret = intel_engine_reset_begin(engine); > + if (ret) { > + DRM_ERROR("Failed to disable %s\n", engine->name); > + goto error; > + } > + > + ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine)); > + if (ret) { > + DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret); > + intel_engine_reset_cancel(engine); > + goto error; > + } Ordering is still broken. > + > + ret = engine->init_hw(engine); > + if (ret) > + goto error; > + > + intel_engine_reset_cancel(engine); > + intel_execlists_restart_submission(engine); This is broken. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx