Re: [PATCH] drm/i915/vlv: assert and de-assert sideband reset at boot and resume v3

Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> · Wed, 21 May 2014 11:06:45 -0700

On Wed, 21 May 2014 20:54:03 +0300
Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> wrote:

> On Tue, May 20, 2014 at 11:09:03AM -0700, Jesse Barnes wrote:
> > This is a bit like the CMN reset de-assert we do in DPIO_CTL, except
> > that it resets the whole common lane section of the PHY.  This is
> > required on machines where the BIOS doesn't do this for us on boot or
> > resume to properly re-calibrate and get the PHY ready to transmit data.
> > 
> > Without this patch, such machines won't resume correctly much of the time,
> > with the symptom being a 'port ready' timeout and/or a link training
> > failure.
> > 
> > v2: extract simpler set_power_well function for use in reset_dpio (Imre)
> >     move to reset_dpio (Daniel & Ville)
> > v3: don't reset if DPIO reset is already de-asserted (Imre)
> > 
> > Signed-off-by: Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 19 +++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_drv.h     |  3 ++-
> >  drivers/gpu/drm/i915/intel_pm.c      | 13 ++++++++++---
> >  3 files changed, 31 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index df58afc..bdb4624 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -1509,6 +1509,25 @@ static void intel_reset_dpio(struct drm_device *dev)
> >  
> >  	} else {
> >  		/*
> > +		 * If DPIO has already been reset, e.g. by BIOS, just skip all
> > +		 * this.
> > +		 */
> > +		if (I915_READ(DPIO_CTL) & DPIO_CMNRST)
> > +			return;
> > +
> > +		/*
> > +		 * From VLV2A0_DP_eDP_HDMI_DPIO_driver_vbios_notes_11.docx:
> > +		 * Need to assert and de-assert PHY SB reset by gating the
> > +		 * common lane power, then un-gating it.
> > +		 * Simply ungating isn't enough to reset the PHY enough to get
> > +		 * ports and lanes running.
> > +		 */
> > +		__vlv_set_power_well(dev_priv, PUNIT_POWER_WELL_DPIO_CMN_BC,
> > +				     false);
> > +		__vlv_set_power_well(dev_priv, PUNIT_POWER_WELL_DPIO_CMN_BC,
> > +				     true);
> 
> Hmm. I wonder how the power well hack in intel_uncore_sanitize() ties in
> with this. We should definitely rip that out regardless.
> 
> Another thing I'm wodering is did the BIOS/hw really power on the
> common lane, or did we do that outselves? If the latter, then I wonder
> if we simply do that too early. Or more precisely do we need to make
> sure the cri clock and/or refclock are enabled before we power on the
> common lane?
> 
> And third is that do we need to enable the TX wells before the CMN
> well? Currently we do the opposite which could also explain why this
> CMN well toggle fixes things.
> 
> And finally we should make sure the cmnreset assert/deassert happens
> corecctly wrt. to the power well transitions.
> 
> 
> So I'd really like to see the following measures taken:
> 
> - kill power well hack from intel_uncore_sanitize()
> 
> - move the DPLL register setup from intel_reset_dpio() into
>   vlv_set_power_well() like so:
> 
>  vlv_set_power_well()
>  {
> +	if (power_well_id == PUNIT_POWER_WELL_DPIO_CMN_BC && enable) {
> +		I915_WRITE(DPLL, ...);
> +		POSTING_READ(DPLL);
> +		udelay(1); /* >10ns for cmnreset, >0ns for sidereset */
> +	}
> 
> 	<really enable the power well>
>  }
> 
> - reorder vlv_power_wells[] to bring up TX wells before CMN.
>   Although maybe the DPLL stuff should then be done already before
>   the TX wells are brought up? But maybe not.
> 
> - move the cmnreset assert/deassert into the power well code like so:
> 
>  vlv_set_power_well()
>  {
> +	if (power_well_id == PUNIT_POWER_WELL_DPIO_CMN_BC && !enable)
> +		assert cmnreset
> 
> 	<do stuff>
> 
> +	if (power_well_id == PUNIT_POWER_WELL_DPIO_CMN_BC && enable) 
> +		deassert cmnreset
>  }
> 
> 
> If those measures aren't enough, then I think this patch would be OK.
> But right now I have this nagging feeling that we may have shot
> ourselves in the foot and we're about to pile workarounds on top of our
> own bug.
> 

I worry about this too, but on the flip side we only have the
incomplete doc to look at, and these bugs can be really hard to find
and test fixes for (this reset one in particular bit only some
machines on only some boots for example).

So going with something untested like the above may make more sense,
but may also give us a false sense of security wrt this issue.

Can you ping the PHY guys with the above and see what they have to
say?  Maybe in the context of the updated programming doc stuff I sent
out?  Ideally we could get firm answers, update the docs, then update
the code.

In the meantime, upstream will be broken without a tested fix on these
machines.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx