On Fri, Nov 27, 2015 at 01:32:11PM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > Following a GPU reset, we may leave the context in a poorly defined > > state, and reloading from that context will leave the GPU flummoxed. For > > secondary contexts, this will lead to that context being banned - but > > currently it is also causing the default context to become banned, > > leading to turmoil in the shared state. > > > > This is a regression from > > > > commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1] > > Author: Ben Widawsky <benjamin.widawsky@xxxxxxxxx> > > Date: Mon Mar 16 16:00:58 2015 +0000 > > > > drm/i915: Initialize all contexts > > > > which quietly introduced the removal of the MI_RESTORE_INHIBIT on the > > default context. > > > > As we never submit anything except driver initialization commands > for that context, what would cause this context to become corrupted? I can only hazard that the act of reseting the GPU left it invalid. A bisect pointed to that commit, and partially reverting each chunk left me with the conclusion that the hang was a direct result of reloading the context. Closer inspection may reveal someelse suspect about the context, but I object to this sneaky change. > Please consider: > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > b/drivers/gpu/drm/i915/i915_gem_context.c > index 43761c5..45b9a39 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -332,6 +332,7 @@ void i915_gem_context_reset(struct drm_device *dev) > for (i = 0; i < I915_NUM_RINGS; i++) { > struct intel_engine_cs *ring = &dev_priv->ring[i]; > struct intel_context *lctx = ring->last_context; > + struct intel_context *dctx = ring->default_context; > > if (lctx) { > if (lctx->legacy_hw_ctx.rcs_state && i == RCS) > @@ -340,6 +341,9 @@ void i915_gem_context_reset(struct drm_device *dev) > i915_gem_context_unreference(lctx); > ring->last_context = NULL; > } > + > + if (dctx) > + dctx->legacy_hw_ctx.initialized = false; > } > } > > To achieve the same effect and as a bonus, get the > same default context (with workarounds) as we > did in driver init. I considered it, and wondered why it wasn't already there. It is a separate issue imo. > I also think that we should zero the global > default context in here to gain similarity wrt > module init. You mean reallocate it from scratch? We have avoided doing the reallocations in the past, as they can fail at inopportune times -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx