Quoting Jeff McGee (2017-11-01 20:41:13) > On Wed, Nov 01, 2017 at 01:58:04PM +0000, Chris Wilson wrote: > > Quoting Michel Thierry (2017-10-31 22:53:09) > > > This patch adds per engine reset and recovery (TDR) support when GuC is > > > used to submit workloads to GPU. > > > > > > In the case of i915 directly submission to ELSP, driver manages hang > > > detection, recovery and resubmission. With GuC submission these tasks > > > are shared between driver and GuC. i915 is still responsible for detecting > > > a hang, and when it does it only requests GuC to reset that Engine. GuC > > > internally manages acquiring forcewake and idling the engine before > > > resetting it. > > > > > > Once the reset is successful, i915 takes over again and handles the > > > resubmission. The scheduler in i915 knows which requests are pending so > > > after resetting a engine, pending workloads/requests are resubmitted > > > again. > > > > > > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the > > > non-guc function names. > > > > > > v3: Removed debug message about engine restarting from which request, > > > since the new baseline do it regardless of submission mode. (Chris) > > > > > > v4: Rebase. > > > > > > v5: Do not pass unnecessary reporting flags to the fw (Jeff); > > > tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase. > > > > > > v6: Rename the existing reset engine function and share a similar > > > interface between guc and non-guc paths (Chris). > > > > > > Signed-off-by: Michel Thierry <michel.thierry@xxxxxxxxx> > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > --- > > > drivers/gpu/drm/i915/i915_drv.c | 15 +++++++++++++-- > > > drivers/gpu/drm/i915/i915_drv.h | 2 ++ > > > drivers/gpu/drm/i915/intel_guc.c | 24 ++++++++++++++++++++++++ > > > drivers/gpu/drm/i915/intel_guc_fwif.h | 1 + > > > drivers/gpu/drm/i915/intel_uncore.c | 5 ----- > > > 5 files changed, 40 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > > index af745749509c..359333a423cf 100644 > > > --- a/drivers/gpu/drm/i915/i915_drv.c > > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags) > > > goto finish; > > > } > > > > > > +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv, > > > + struct intel_engine_cs *engine) > > > +{ > > > + return intel_gpu_reset(dev_priv, intel_engine_flag(engine)); > > > +} > > > + > > > /** > > > * i915_reset_engine - reset GPU engine to recover from a hang > > > * @engine: engine to reset > > > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags) > > > goto out; > > > } > > > > > > - ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine)); > > > + if (!engine->i915->guc.execbuf_client) > > > + ret = intel_gt_reset_engine(engine->i915, engine); > > > + else > > > + ret = intel_guc_reset_engine(&engine->i915->guc, engine); > > > + > > > if (ret) { > > > /* If we fail here, we expect to fallback to a global reset */ > > > - DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n", > > > + DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n", > > > + (engine->i915->guc.execbuf_client ? "GUC ":""), > > > > A bit overkill on the parentheses there ;) > > > > Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc > > interaction? > > -Chris > > There is one small change needed in the GuC preemption protocol to make it > compatible with GuC engine reset. I will send that shortly. > > There are also a couple of corner case bugs with engine reset in our current > firmware versions. We are planning a firmware update to address those. But > the host-side code here is fine. So... > > Reviewed-by: Jeff McGee <jeff.mcgee@xxxxxxxxx> Pushed this series, along with the preempt interaction fix. Thanks, -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx