On Mon, Dec 15, 2014 at 07:03:43PM +0200, Ville Syrjälä wrote: > On Mon, Dec 15, 2014 at 04:24:48PM +0000, Chris Wilson wrote: > > On Mon, Dec 15, 2014 at 10:41:52AM +0100, Daniel Vetter wrote: > > > On Thu, Dec 11, 2014 at 08:17:01AM +0000, Chris Wilson wrote: > > > > There exists a current workaround to prevent a hang on context switch > > > > should the ring go to sleep in the middle of the restore, > > > > WaProgramMiArbOnOffAroundMiSetContext (applicable to all gen7+). In > > > > spite of disabling arbitration (which prevents the ring from powering > > > > down during the critical section) we were still hitting hangs that had > > > > the hallmarks of the known erratum. That is we are still seeing hangs > > > > "on the last instruction in the context restore". By comparing -nightly > > > > (broken) with requests (working), we were able to deduce that it was the > > > > semaphore LRI cross-talk that reproduced the original failure. > > > > Explicitly disabling PMSI sleep on the RCS ring was insufficient, all > > > > the rings had to be awake to prevent the hangs. Fortunately, we can > > > > reduce the wakelock to the MI_SET_CONTEXT operation itself, and so > > > > should be able to limit the extra power implications. > > > > > > > > Since the MI_ARB_ON_OFF workaround is listed for all gen7 and above > > > > products, we should apply this extra hammer for all of the same > > > > platforms despite so far that we have only been able to reproduce the > > > > hang on certain ivb and hsw models. The last question is whether we want > > > > to always use the extra hammer or only when we know semaphores are in > > > > operation. At the moment, we only use LRI on non-RCS rings for > > > > semaphores, but that may change in the future with the possibility of > > > > reintroducing this bug under subtle conditions. > > > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80660 > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83677 > > > > Cc: Simon Farnsworth <simon@xxxxxxxxxxxx> > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > > Oh dear, but awesome piece of debugging! > > > > > > First question: How does this work on vlv with the split forcewake domain? > > > Any bad side effects like longer ctx switch times? > > > > It will work on vlv as well since the split power domains do not matter > > here: the LRI is on an active ring, and the result of the LRI is to > > temporary wake up the other domain. > > The negative impact is that we do wake up other power domains and rings > > during the context switch, a small but definite power increase. > > I don't think there's any point in worrying about that for context > switches as long as we still have the "semaphore LRI wakes up all > power wells for every single batch" issue around. Hm right, there have been patches floating around for that one ... A short mention in the commit message about this would be useful. > > > > --- > > > > drivers/gpu/drm/i915/i915_gem_context.c | 39 ++++++++++++++++++++++++--------- > > > > drivers/gpu/drm/i915/i915_reg.h | 2 ++ > > > > 2 files changed, 31 insertions(+), 10 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > > > index 2acf5803cf32..724ccecea06a 100644 > > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > > > @@ -484,7 +484,9 @@ mi_set_context(struct intel_engine_cs *ring, > > > > u32 hw_flags) > > > > { > > > > u32 flags = hw_flags | MI_MM_SPACE_GTT; > > > > - int ret; > > > > + struct intel_engine_cs *engine; > > > > + const int num_rings = i915_semaphore_is_enabled(ring->dev) ? hweight32(INTEL_INFO(ring->dev)->ring_mask) : 0; > > > > + int len, i, ret; > > > > > > > > /* w/a: If Flush TLB Invalidation Mode is enabled, driver must do a TLB > > > > * invalidation prior to MI_SET_CONTEXT. On GEN6 we don't set the value > > > > @@ -501,15 +503,26 @@ mi_set_context(struct intel_engine_cs *ring, > > > > if (!IS_HASWELL(ring->dev) && INTEL_INFO(ring->dev)->gen < 8) > > > > flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN); > > > > > > > > - ret = intel_ring_begin(ring, 6); > > > > + > > > > + len = 4; > > > > + if (INTEL_INFO(ring->dev)->gen >= 7) > > > > + len += 4*num_rings + 2; > > > > + > > > > + ret = intel_ring_begin(ring, len); > > > > if (ret) > > > > return ret; > > > > > > > > /* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */ > > > > - if (INTEL_INFO(ring->dev)->gen >= 7) > > > > - intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE); > > > > - else > > > > - intel_ring_emit(ring, MI_NOOP); > > > > + if (INTEL_INFO(ring->dev)->gen >= 7) { > > > > + if (num_rings) { > > > > + intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(num_rings)); > > > > + for_each_ring(engine, to_i915(ring->dev), i) { > > > > + intel_ring_emit(ring, RING_PSMI_CTL(engine->mmio_base)); > > > > + intel_ring_emit(ring, _MASKED_BIT_ENABLE(GEN6_PSMI_SLEEP_MSG_DISABLE)); > > > > + } > > > > + } else > > > > + intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE); > > > > > > For added paranoia, shouldn't we keep this one here? Or is the only thing > > > this does frob the idle notifications like PSMI? I kinda prefer we just > > > keep it if there's no harmful effects ... > > > > Yeah, MI_ARB_ON_OFF is just a magic command used to do the same as the > > PSMI twiddling on the local ring. > > > > We can do MI_ARB_ON_OFF + LRI(!RCS) which is slightly shorter > > instruction wise, closer to the original w/a and looks slightly neater. Tbh without confirmation from a hw uarch that they're equivalent I prefer we keep the mi_arb dance for the local ring, just in case that has some other magic sauce going on. Means a bit more testing unfortunately, but this has been going on for a long time already anyway. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html