On Wed, Jan 25, 2017 at 02:31:08PM +0200, Mika Kuoppala wrote: > Certain Baytrails, namely the 4 cpu core variants, have been > plaqued by spurious system hangs, mostly occurring with light loads. > > Multiple bisects by various people point to a commit which changes the > reclocking strategy for Baytrail to follow its bigger brethen: > commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail") > > There is also a review comment attached to this commit from Deepak S > on avoiding punit access on Cherryview and thus it is excluded on > common reclocking path. By taking the same approach and omitting > the punit access by not tweaking the thresholds when the hardware > has been asked to move into different frequency, considerable gains > in stability have been observed. > > With J1900 box, light render/video load would end up in system hang > in usually less than 12 hours. With this patch applied, the cumulative > uptime has now been 34 days without issues. To provoke system hang, > light loads on both render and bsd engines in parallel have been used: > glxgears >/dev/null 2>/dev/null & > mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4 > > So far, author has not witnessed system hang with above load > and this patch applied. Reports from the tenacious people at > kernel bugzilla are also promising. > > Considering that the punit access frequency with this patch is > considerably less, there is a possibility that this will push > the, still unknown, root cause past the triggering point on most loads. > Further work on investigating the punit accesses on byt is welcomed. > > References: https://bugzilla.kernel.org/show_bug.cgi?id=109051 > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Cc: Len Brown <len.brown@xxxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Jani Nikula <jani.nikula@xxxxxxxxx> > Cc: fritsch@xxxxxxxx > Cc: miku@xxxxxx > Cc: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxxxxxxxxx> > CC: Michal Feix <michal@xxxxxxx> > Cc: Hans de Goede <hdegoede@xxxxxxxxxx> > Cc: Deepak S <deepak.s@xxxxxxxxxxxxxxx> > Cc: Jarkko Nikula <jarkko.nikula@xxxxxxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # v4.2+ > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> It sucks, but I guess this is better than dead machines. I'd say let's wait another 1-2 weeks for tested-bys to trickle in, and if it does fix the problem then let's apply it. rps keeps on sucking, that's unfortunately not news at all. Acked-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > --- > drivers/gpu/drm/i915/i915_irq.c | 4 ++-- > drivers/gpu/drm/i915/i915_reg.h | 2 ++ > drivers/gpu/drm/i915/intel_pm.c | 2 +- > 3 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 3fc286c..4b9635f 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -1039,7 +1039,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir) > if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) { > if (!vlv_c0_above(dev_priv, > &dev_priv->rps.down_ei, &now, > - dev_priv->rps.down_threshold)) > + VLV_RP_DOWN_EI_THRESHOLD)) > events |= GEN6_PM_RP_DOWN_THRESHOLD; > dev_priv->rps.down_ei = now; > } > @@ -1047,7 +1047,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir) > if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) { > if (vlv_c0_above(dev_priv, > &dev_priv->rps.up_ei, &now, > - dev_priv->rps.up_threshold)) > + VLV_RP_UP_EI_THRESHOLD)) > events |= GEN6_PM_RP_UP_THRESHOLD; > dev_priv->rps.up_ei = now; > } > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index 70d9616..09f6aea 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -787,6 +787,8 @@ enum skl_disp_power_wells { > #define CHV_BIAS_CPU_50_SOC_50 (3 << 2) > > #define VLV_CZ_CLOCK_TO_MILLI_SEC 100000 > +#define VLV_RP_UP_EI_THRESHOLD 90 > +#define VLV_RP_DOWN_EI_THRESHOLD 70 > > /* vlv2 north clock has */ > #define CCK_FUSE_REG 0x8 > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index db24f89..1923b6b 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -4983,7 +4983,7 @@ static void valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val) > > if (val != dev_priv->rps.cur_freq) { > vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val); > - if (!IS_CHERRYVIEW(dev_priv)) > + if (!(IS_CHERRYVIEW(dev_priv) || IS_VALLEYVIEW(dev_priv))) > gen6_set_rps_thresholds(dev_priv, val); > } > > -- > 2.7.4 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html