Re: [PATCH 1/2] drm/i915: Stop using DC states if firmware disagrees on the state

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 20 Jan 2016 10:17:27 +0000



On Wed, Jan 20, 2016 at 11:49:51AM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > On Tue, Jan 19, 2016 at 09:50:08PM +0200, Mika Kuoppala wrote:
> >> Sometimes we get dmesg warnings claiming that DC6 was already
> >> enabled prior to our enabling. Investigations using readback of
> >> the written dc state value indicate that sometimes when we disable
> >> the dc6, the write doesn't stick, or it does but for a very short time.
> >> 
> >> This leads to state keeping confusion between firmware and driver,
> >> driver thinking that dc6 is off and proceeds with modesets
> >> while firmware still keeps/thinks dc6 is on. Evidence from logs
> >> suggests that the dc6 is still on as flips start to timeout,
> >> leading to eventual gpu hang.
> >> 
> >> Further complication is that to recover the hang, we reset while
> >> the DC6 is enabled. With this state on the reinit of the gpu side
> >> won't come out clean and rings rehang right after the init.
> >> 
> >> Harden the detection of this situation and immediately disable
> >> DC6 if we disagree on state with the firmware. This should make it
> >> less probable for driver to do end up in a wrong conclusion and
> >> let things like modeset and gpu reset proceed with dc6 still
> >> enabled.
> >> 
> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
> >> Suggested-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >> Cc: Patrik Jakobsson <patrik.jakobsson@xxxxxxxxxxxxxxx>
> >> Cc: Imre Deak <imre.deak@xxxxxxxxx>
> >> Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> >> ---
> >>  drivers/gpu/drm/i915/intel_runtime_pm.c | 31 +++++++++++++++++++++++++++----
> >>  1 file changed, 27 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
> >> index bbca527184d0..5a21453e38e1 100644
> >> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> >> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> >> @@ -470,9 +470,23 @@ static void gen9_set_dc_state_debugmask_memory_up(
> >>  	}
> >>  }
> >>  
> >> +static int gen9_dc_state_verified(struct drm_i915_private *dev_priv,
> >> +				  u32 mask, u32 state)
> >> +{
> >> +	int i;
> >> +
> >> +	/* Observe the dc state with handful of reads */
> >> +	for (i = 0; i < 3; i++) {
> >> +		if (wait_for((I915_READ(DC_STATE_EN) & mask) == state, 100))
> >> +			return false;
> >> +	}
> >
> > Why two loops of reads?
> >
> 
> I once saw the write stick but for a very short time, so with one wait_for we
> might miss it.
> 
> > Since this is equivalent to
> > 	return wait_for((I915_READ(DC_STATE_EN) & mask) == state, 300) == 0;
> >
> > In this case
> >
> > WRITE(DC_STATE_EN, val & ~mask | state);
> > if (wait_for((READ(DC_STATE_END) & mask) = state, 300) {
> > 	/* oh noes */
> > }
> >
> > is more obvious (since we verify with a read of the same register, not
> > some magic ack sequence).
> 
> I could open code the 2 checks here, would be more obvious yes.
> 
> Btw what led you to think that the state doesn't stick? Looking
> at the code for correctness and then deducting that if that warn
> happens, there is no other explanation?

Yes. It has also been used in the past as a form of acknowledging state
changes, though usually it is a separate bit (i.e. a request bit and a
current state bit).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx