[PATCH] drm/i915: Unconditionally flush residual writes before batches

chris at chris-wilson.co.uk (Chris Wilson) · Thu, 12 Jul 2012 11:10:37 +0100

On Thu, 12 Jul 2012 11:00:17 +0200, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Thu, Jul 12, 2012 at 09:29:42AM +0100, Chris Wilson wrote:
> > During batch buffer emission we flag that the next request should
> > generate a flush. The intention is that we queue a request following the
> > batch buffer in order to provide the breadcrumb, and so always flush all
> > caches after every batch, killing the flushing list. However, X
> > carefully schedules a signal to arrive just as we are checking whether
> > the ring has enough space to enqueue the flush and so we abort that
> > addition with an ERESTARTSYS. The next batch then clears the write_domain,
> > and we end up with an object with no write_domain on the ring's
> > gpu_write_list. Havoc WARNs and BUGs ensue.
> > 
> > The gpu_write_list is doomed as it is merely a remnant  of the flushing
> > list, so the easiest fix in the meantime is to clear all residual
> > members from the gpu_write_list prior to updating the write_domain on
> > the next batch.
> > 
> > Fixes regression from
> > 
> > commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
> > Author: Daniel Vetter <daniel.vetter at ffwll.ch>
> > Date:   Wed Jun 13 20:45:19 2012 +0200
> > 
> >     drm/i915: disable flushing_list/gpu_write_lis
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc:  Daniel Vetter <daniel.vetter at ffwll.ch>
> 
> Thinking about this some more, this could actually be a regression from
> 
> commit de2b998552c1534e87bfbc51ec5734b02bc89020
> Author: Daniel Vetter <daniel.vetter at ffwll.ch>
> Date:   Wed Jul 4 22:52:50 2012 +0200
> 
>     drm/i915: don't return a spurious -EIO from intel_ring_begin
> 
> Only with that patch does intel_ring_begin return -ERESTARTSYS.

Ok, that would better fit in with the discovery timeline, and indeed it
was a side-effect of the "not yet audited all the paths...". How true
you were. :)

> > ---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 88e2e11..4c521df 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -885,8 +885,9 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
> >  			return ret;
> >  	}
> >  
> > -	/* Unconditionally invalidate gpu caches. */
> > -	ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0);
> > +	/* Unconditionally invalidate gpu caches and flush residual writes. */
> 
> Hm, can we have a big FIXME here that we should change this back once the
> gpu_write_list is properly burried?

I fully expect i915_gem_flush_ring() to die in a fire once the flushing
list removal is complete. There will only be two callsites, both in
i915_gem_execbuffer.c; one to invalidate and one to flush. So we might
as well call them thus. If you want to tag an XXX in there, feel free,
I was just trying to keep inside 80 cols. ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre