Re: [PATCH] drm/i915/byt: Avoid tweaking evaluation thresholds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 25, 2017 at 02:31:08PM +0200, Mika Kuoppala wrote:
> Certain Baytrails, namely the 4 cpu core variants, have been
> plaqued by spurious system hangs, mostly occurring with light loads.
> 
> Multiple bisects by various people point to a commit which changes the
> reclocking strategy for Baytrail to follow its bigger brethen:
> commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail")
> 
> There is also a review comment attached to this commit from Deepak S
> on avoiding punit access on Cherryview and thus it is excluded on
> common reclocking path. By taking the same approach and omitting
> the punit access by not tweaking the thresholds when the hardware
> has been asked to move into different frequency, considerable gains
> in stability have been observed.
> 
> With J1900 box, light render/video load would end up in system hang
> in usually less than 12 hours. With this patch applied, the cumulative
> uptime has now been 34 days without issues. To provoke system hang,
> light loads on both render and bsd engines in parallel have been used:
> glxgears >/dev/null 2>/dev/null &
> mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4
> 
> So far, author has not witnessed system hang with above load
> and this patch applied. Reports from the tenacious people at
> kernel bugzilla are also promising.
> 
> Considering that the punit access frequency with this patch is
> considerably less, there is a possibility that this will push
> the, still unknown, root cause past the triggering point on most loads.
> Further work on investigating the punit accesses on byt is welcomed.
> 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=109051
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> Cc: Len Brown <len.brown@xxxxxxxxx>
> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
> Cc: fritsch@xxxxxxxx
> Cc: miku@xxxxxx
> Cc: Ezequiel Garcia <ezequiel@xxxxxxxxxxxxxxxxxxxx>
> CC: Michal Feix <michal@xxxxxxx>
> Cc: Hans de Goede <hdegoede@xxxxxxxxxx>
> Cc: Deepak S <deepak.s@xxxxxxxxxxxxxxx>
> Cc: Jarkko Nikula <jarkko.nikula@xxxxxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # v4.2+
> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>

It sucks, but I guess this is better than dead machines. I'd say let's
wait another 1-2 weeks for tested-bys to trickle in, and if it does fix
the problem then let's apply it. rps keeps on sucking, that's
unfortunately not news at all.

Acked-by: Daniel Vetter <daniel.vetter@xxxxxxxx>

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
>  drivers/gpu/drm/i915/i915_reg.h | 2 ++
>  drivers/gpu/drm/i915/intel_pm.c | 2 +-
>  3 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 3fc286c..4b9635f 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1039,7 +1039,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
>  	if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
>  		if (!vlv_c0_above(dev_priv,
>  				  &dev_priv->rps.down_ei, &now,
> -				  dev_priv->rps.down_threshold))
> +				  VLV_RP_DOWN_EI_THRESHOLD))
>  			events |= GEN6_PM_RP_DOWN_THRESHOLD;
>  		dev_priv->rps.down_ei = now;
>  	}
> @@ -1047,7 +1047,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
>  	if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
>  		if (vlv_c0_above(dev_priv,
>  				 &dev_priv->rps.up_ei, &now,
> -				 dev_priv->rps.up_threshold))
> +				 VLV_RP_UP_EI_THRESHOLD))
>  			events |= GEN6_PM_RP_UP_THRESHOLD;
>  		dev_priv->rps.up_ei = now;
>  	}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 70d9616..09f6aea 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -787,6 +787,8 @@ enum skl_disp_power_wells {
>  #define 	CHV_BIAS_CPU_50_SOC_50 (3 << 2)
>  
>  #define VLV_CZ_CLOCK_TO_MILLI_SEC		100000
> +#define VLV_RP_UP_EI_THRESHOLD			90
> +#define VLV_RP_DOWN_EI_THRESHOLD		70
>  
>  /* vlv2 north clock has */
>  #define CCK_FUSE_REG				0x8
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index db24f89..1923b6b 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4983,7 +4983,7 @@ static void valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
>  
>  	if (val != dev_priv->rps.cur_freq) {
>  		vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
> -		if (!IS_CHERRYVIEW(dev_priv))
> +		if (!(IS_CHERRYVIEW(dev_priv) || IS_VALLEYVIEW(dev_priv)))
>  			gen6_set_rps_thresholds(dev_priv, val);
>  	}
>  
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]