Re: [PATCH v2] drm/i915: Prevent TLB error on first execution on SNB

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 10 Mar 2015 10:35:41 +0000

On Tue, Mar 10, 2015 at 11:31:04AM +0100, Daniel Vetter wrote:
> On Fri, Feb 13, 2015 at 02:35:59PM +0000, Chris Wilson wrote:
> > Long ago I found that I was getting sporadic errors when booting SNB,
> > with the symptom being that the first batch died with IPEHR != *ACTHD,
> > typically caused by the TLB being invalid. These magically disappeared
> > if I held the forcewake during the entire ring initialisation sequence.
> > (It can probably be shortened to a short critical section, but the whole
> > initialisation is full of register writes and so we would be taking and
> > releasing forcewake almost continually, and so holding it over the
> > entire sequence will probably be a net win!)
> > 
> > Note some of the kernels I encounted the issue already had the deferred
> > forcewake release, so it is still relevant.
> > 
> > I know that there have been a few other reports with similar failure
> > conditions on SNB, I think such as
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> > 
> > v2: Wrap i915_gem_init_hw() with its own security blanket as we take
> > that path following resume and reset.
> > 
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 8d15c8110962..08450922f373 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4783,6 +4783,9 @@ i915_gem_init_hw(struct drm_device *dev)
> >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> >  		return -EIO;
> >  
> > +	/* Double layer security blanket, see i915_gem_init() */
> > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> > +
> >  	if (dev_priv->ellc_size)
> >  		I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
> >  
> > @@ -4815,7 +4818,7 @@ i915_gem_init_hw(struct drm_device *dev)
> >  	for_each_ring(ring, dev_priv, i) {
> >  		ret = ring->init_hw(ring);
> >  		if (ret)
> > -			return ret;
> > +			goto out;
> >  	}
> >  
> >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > @@ -4832,9 +4835,11 @@ i915_gem_init_hw(struct drm_device *dev)
> >  		DRM_ERROR("Context enable failed %d\n", ret);
> >  		i915_gem_cleanup_ringbuffer(dev);
> >  
> > -		return ret;
> > +		goto out;
> >  	}
> >  
> > +out:
> > +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> >  	return ret;
> >  }
> >  
> > @@ -4868,6 +4873,14 @@ int i915_gem_init(struct drm_device *dev)
> >  		dev_priv->gt.stop_ring = intel_logical_ring_stop;
> >  	}
> >  
> > +	/* This is just a security blanket to placate dragons.
> > +	 * On some systems, we very sporadically observe that the first TLBs
> > +	 * used by the CS may be stale, despite us poking the TLB reset. If
> > +	 * we hold the forcewake during initialisation these problems
> > +	 * just magically go away.
> > +	 */
> > +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> 
> gem_init shouldn't ever touch the hw except through gem_init_hw. Do we
> really need the double-layer here?

There are register accesses before, so yes since that's how I tested
it...

> Also the forcewake hack in the ring
> init code should now be redundant, too.

I am of the opinion that they still serve documentary value. Unless you
have an assert_force_wake() handy.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx