Re: [PATCH] drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Fri, 24 Oct 2014 17:39:59 +0300

On Fri, Oct 24, 2014 at 04:28:50PM +0200, Daniel Vetter wrote:
> On Tue, May 27, 2014 at 09:33:09PM +0300, ville.syrjala@xxxxxxxxxxxxxxx wrote:
> > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > 
> > During a GPU reset we need to get pending page flip cleared out
> > since the ring contents are gone and flip will never complete
> > on its own. This used to work until the mmio vs. CS flip race
> > detection came about. That piece of code is looking for a
> > specific surface address in the SURFLIVE register, but as
> > a flip to that address may never happen the check may never
> > pass. So we should just skip the SURFLIVE and flip counter
> > checks when the GPU gets reset.
> > 
> > intel_display_handle_reset() tries to effectively complete
> > the flip anyway by calling .update_primary_plane(). But that
> > may not satisfy the conditions of the mmio vs. CS race
> > detection since there's no guarantee that a modeset didn't
> > sneak in between the GPU reset and intel_display_handle_reset().

Such a modeset will not wait for pending flips due to the ongoing GPU
reset, and then the primary plane updates ...

> > When that happens the primary plane updates performed by
> > intel_display_handle_reset() will already use the new surface
> > address, and thus the surface address the flip is waiting for
> > might never appear in SURFLIVE. The result is that the flip
> > will never complete and attempts to perform further page flips
> > will fail with -EBUSY.
> > 
> > During the GPU reset intel_crtc_has_pending_flip() will return
> > false regardless, so the deadlock with a modeset vs. the error
> > work acquiring crtc->mutex was avoided. And the reset_counter
> > check in intel_crtc_has_pending_flip() actually made this bug
> > even less severe since it allowed normal modesets to go through
> > even though there's a pending flip.
> > 
> > This is a regression introduced by me here:
> >  commit 75f7f3ec600524c9544cc31695155f1a9ddbe1d9
> >  Author: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> >  Date:   Tue Apr 15 21:41:34 2014 +0300
> > 
> >     drm/i915: Fix mmio vs. CS flip race on ILK+
> > 
> > Testcase: igt/kms_flip/flip-vs-panning-vs-hang
> > Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> 
> So I wondered first how we'll get there since the set_base code waits for
> pageflips before updating anything. Except it doesn't when there's a
> pending gpu reset. I don't think we need Chris' additional paranoi since
> we must complete the pageflip anyway, and we do it with hacks when there's
> a gpu reset. So if the commit message gets augmented with some note that
> the pageflip wait in set_base is short-circuited for resets this is
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> 
> Cheers, Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 94ac51f..cb9dd8e 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -8895,6 +8895,10 @@ static bool page_flip_finished(struct intel_crtc *crtc)
> >  	struct drm_device *dev = crtc->base.dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  
> > +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> > +	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> > +		return true;
> > +
> >  	/*
> >  	 * The relevant registers doen't exist on pre-ctg.
> >  	 * As the flip done interrupt doesn't trigger for mmio
> > -- 
> > 1.8.5.5
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx