Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > On Wed, Feb 15, 2017 at 02:37:50PM +0200, Mika Kuoppala wrote: >> Certain Baytrails, namely the 4 cpu core variants, have been >> plaqued by spurious system hangs, mostly occurring with light loads. >> >> Multiple bisects by various people point to a commit which changes the >> reclocking strategy for Baytrail to follow its bigger brethen: >> commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail") >> >> There is also a review comment attached to this commit from Deepak S >> on avoiding punit access on Cherryview and thus it was excluded on >> common reclocking path. By taking the same approach and omitting >> the punit access by not tweaking the thresholds when the hardware >> has been asked to move into different frequency, considerable gains >> in stability have been observed. >> >> With J1900 box, light render/video load would end up in system hang >> in usually less than 12 hours. With this patch applied, the cumulative >> uptime has now been 34 days without issues. To provoke system hang, >> light loads on both render and bsd engines in parallel have been used: >> glxgears >/dev/null 2>/dev/null & >> mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4 >> >> So far, author has not witnessed system hang with above load >> and this patch applied. Reports from the tenacious people at >> kernel bugzilla are also promising. >> >> Considering that the punit access frequency with this patch is >> considerably less, there is a possibility that this will push >> the, still unknown, root cause past the triggering point on most loads. >> >> But as we now can reliably reproduce the hang independently, >> we can reduce the pain that users are having and use a >> static thresholds until a root cause is found. >> >> References: https://bugzilla.kernel.org/show_bug.cgi?id=109051 >> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> >> Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> >> Cc: Len Brown <len.brown@xxxxxxxxx> >> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> >> Cc: Jani Nikula <jani.nikula@xxxxxxxxx> >> Cc: fritsch@xxxxxxxx >> Cc: miku@xxxxxx >> Cc: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxxxxxxxxx> >> CC: Michal Feix <michal@xxxxxxx> >> Cc: Hans de Goede <hdegoede@xxxxxxxxxx> >> Cc: Deepak S <deepak.s@xxxxxxxxxxxxxxx> >> Cc: Jarkko Nikula <jarkko.nikula@xxxxxxxxxxxxxxx> >> Cc: <stable@xxxxxxxxxxxxxxx> # v4.2+ >> Acked-by: Daniel Vetter <daniel.vetter@xxxxxxxx> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> >> --- >> drivers/gpu/drm/i915/i915_irq.c | 4 ++-- >> drivers/gpu/drm/i915/i915_reg.h | 2 ++ >> drivers/gpu/drm/i915/intel_pm.c | 6 +++++- >> 3 files changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c >> index a887aef..319c02d 100644 >> --- a/drivers/gpu/drm/i915/i915_irq.c >> +++ b/drivers/gpu/drm/i915/i915_irq.c >> @@ -1095,7 +1095,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir) >> if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) { >> if (!vlv_c0_above(dev_priv, >> &dev_priv->rps.down_ei, &now, >> - dev_priv->rps.down_threshold)) >> + VLV_RP_DOWN_EI_THRESHOLD)) >> events |= GEN6_PM_RP_DOWN_THRESHOLD; >> dev_priv->rps.down_ei = now; >> } >> @@ -1103,7 +1103,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir) >> if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) { >> if (vlv_c0_above(dev_priv, >> &dev_priv->rps.up_ei, &now, >> - dev_priv->rps.up_threshold)) >> + VLV_RP_UP_EI_THRESHOLD)) > > A patch to set them as we set the default values during rps enable so > that you don't break the debug interfaces. > >> events |= GEN6_PM_RP_UP_THRESHOLD; >> dev_priv->rps.up_ei = now; >> } >> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h >> index 141a5c1..1297f6a 100644 >> --- a/drivers/gpu/drm/i915/i915_reg.h >> +++ b/drivers/gpu/drm/i915/i915_reg.h >> @@ -1135,6 +1135,8 @@ enum skl_disp_power_wells { >> #define CHV_BIAS_CPU_50_SOC_50 (3 << 2) >> >> #define VLV_CZ_CLOCK_TO_MILLI_SEC 100000 >> +#define VLV_RP_UP_EI_THRESHOLD 90 >> +#define VLV_RP_DOWN_EI_THRESHOLD 70 >> >> /* vlv2 north clock has */ >> #define CCK_FUSE_REG 0x8 >> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c >> index 3d311e1..bce6aae 100644 >> --- a/drivers/gpu/drm/i915/intel_pm.c >> +++ b/drivers/gpu/drm/i915/intel_pm.c >> @@ -4971,7 +4971,11 @@ static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val) >> if (err) >> return err; >> >> - gen6_set_rps_thresholds(dev_priv, val); >> + /* When byt can survive without system hang with dynamic >> + * sw freq adjustments, this restriction can be lifted. >> + */ >> + if (!IS_VALLEYVIEW(dev_priv)) > > Are all vlv affected? Not all. From what I have gathered, the 4 core variants are the susceptile ones. For example N28xx works, N29xx freezes. -Mika > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre