Re: [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Fri, 5 Dec 2014 16:26:23 +0000



On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote:
> On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > Currently we initialise the rings, add the first context switch to the
> > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > MI_LRI, we end up executing the context switch and golden render state
> > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > the ppgtt PD setup, then set the default context and write the commands
> > > > to run the render state into the ring, before we activate the ring. This
> > > > allows us to be sure that the register state is valid before we begin
> > > > execution.
> > > > 
> > > > This was spotted when writing the seqno/request conversion, but only with
> > > > the ERROR capture did I realise that it was a necessity now.
> > > > 
> > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index c1c11418231b..c13842d3cbc9 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > >  	 */
> > > >  	init_unused_rings(dev);
> > > >  
> > > > -	for_each_ring(ring, dev_priv, i) {
> > > > -		ret = ring->init_hw(ring);
> > > > -		if (ret)
> > > > -			return ret;
> > > > -	}
> > > > -
> > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > 
> > > This is going to assume ring->head/tail are already valid?
> > 
> > We write into the ring obj, not the ring itself, which should be setup
> > during the various intel_init_engine, i.e. the backing storage is
> > independent of the actual registers.
> 
> I mean the software shadows, not the registers themselves. When the GPU
> hangs I expect rign->head != ring->tail. So what makes those two identical
> again after the GPU reset?

Why would they be equal after reset? At the moment, we discard all
outstanding requests which makes them equal. We have been discussing the
need to execute pending batches from non-guitly Batches for yonks, in
which case we would not expect ring->head == ring->tail after a reset.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx