Op 18-08-16 om 16:05 schreef Lyude Paul: > On Thu, 2016-08-18 at 09:39 +0200, Maarten Lankhorst wrote: >> Hey, >> >> Op 17-08-16 om 21:55 schreef Lyude: >>> Since the watermark calculations for Skylake are still broken, we're apt >>> to hitting underruns very easily under multi-monitor configurations. >>> While it would be lovely if this was fixed, it's not. Another problem >>> that's been coming from this however, is the mysterious issue of >>> underruns causing full system hangs. An easy way to reproduce this with >>> a skylake system: >>> >>> - Get a laptop with a skylake GPU, and hook up two external monitors to >>> it >>> - Move the cursor from the built-in LCD to one of the external displays >>> as quickly as you can >>> - You'll get a few pipe underruns, and eventually the entire system will >>> just freeze. >>> >>> After doing a lot of investigation and reading through the bspec, I >>> found the existence of the SAGV, which is responsible for adjusting the >>> system agent voltage and clock frequencies depending on how much power >>> we need. According to the bspec: >>> >>> "The display engine access to system memory is blocked during the >>> adjustment time. SAGV defaults to enabled. Software must use the >>> GT-driver pcode mailbox to disable SAGV when the display engine is not >>> able to tolerate the blocking time." >>> >>> The rest of the bspec goes on to explain that software can simply leave >>> the SAGV enabled, and disable it when we use interlaced pipes/have more >>> then one pipe active. >>> >>> Sure enough, with this patchset the system hangs resulting from pipe >>> underruns on Skylake have completely vanished on my T460s. Additionally, >>> the bspec mentions turning off the SAGV with more then one pipe >>> enabled >>> as a workaround for display underruns. While this patch doesn't entirely >>> fix that, it looks like it does improve the situation a little bit so >>> it's likely this is going to be required to make watermarks on Skylake >>> fully functional. >>> >>> This will still need additional work in the future: we shouldn't be >>> enabling the SAGV if any of the currently enabled planes can't enable WM >>> levels that introduce latencies >= 30 µs. >>> >>> Changes since v11: >>> - Add skl_can_enable_sagv() >>> - Make sure we don't enable SAGV when not all planes can enable >>> watermarks >= the SAGV engine block time. I was originally going to >>> save this for later, but I recently managed to run into a machine >>> that was having problems with a single pipe configuration + SAGV. >>> - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit >>> - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE >>> - Move printks outside of mutexes >>> - Don't print error messages twice >>> Changes since v10: >>> - Apparently sandybridge_pcode_read actually writes values and reads >>> them back, despite it's misleading function name. This means we've >>> been doing this mostly wrong and have been writing garbage to the >>> SAGV control. Because of this, we no longer attempt to read the SAGV >>> status during initialization (since there are no helpers for this). >>> - mlankhorst noticed that this patch was breaking on some very early >>> pre-release Skylake machines, which apparently don't allow you to >>> disable the SAGV. To prevent machines from failing tests due to SAGV >>> errors, if the first time we try to control the SAGV results in the >>> mailbox indicating an invalid command, we just disable future attempts >>> to control the SAGV state by setting dev_priv->skl_sagv_status to >>> I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg. >>> - Move mutex_unlock() a little higher in skl_enable_sagv(). This >>> doesn't actually fix anything, but lets us release the lock a little >>> sooner since we're finished with it. >>> Changes since v9: >>> - Only enable/disable sagv on Skylake >>> Changes since v8: >>> - Add intel_state->modeset guard to the conditional for >>> skl_enable_sagv() >>> Changes since v7: >>> - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's >>> all we use it for anyway) >>> - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification >>> - Fix a styling error that snuck past me >>> Changes since v6: >>> - Protect skl_enable_sagv() with intel_state->modeset conditional in >>> intel_atomic_commit_tail() >>> Changes since v5: >>> - Don't use is_power_of_2. Makes things confusing >>> - Don't use the old state to figure out whether or not to >>> enable/disable the sagv, use the new one >>> - Split the loop in skl_disable_sagv into it's own function >>> - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail() >>> Changes since v4: >>> - Use is_power_of_2 against active_crtcs to check whether we have > 1 >>> pipe enabled >>> - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0 >>> enabled >>> - Call skl_sagv_enable/disable() from pre/post-plane updates >>> Changes since v3: >>> - Use time_before() to compare timeout to jiffies >>> Changes since v2: >>> - Really apply minor style nitpicks to patch this time >>> Changes since v1: >>> - Added comments about this probably being one of the requirements to >>> fixing Skylake's watermark issues >>> - Minor style nitpicks from Matt Roper >>> - Disable these functions on Broxton, since it doesn't have an SAGV >>> >>> Signed-off-by: Lyude <cpaul@xxxxxxxxxx> >>> Cc: Matt Roper <matthew.d.roper@xxxxxxxxx> >>> Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> >>> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> >>> Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> >>> Cc: stable@xxxxxxxxxxxxxxx >>> --- >>> drivers/gpu/drm/i915/i915_drv.h | 7 ++ >>> drivers/gpu/drm/i915/i915_reg.h | 4 + >>> drivers/gpu/drm/i915/intel_display.c | 11 +++ >>> drivers/gpu/drm/i915/intel_drv.h | 3 + >>> drivers/gpu/drm/i915/intel_pm.c | 148 >>> +++++++++++++++++++++++++++++++++++ >>> 5 files changed, 173 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>> b/drivers/gpu/drm/i915/i915_drv.h >>> index 35caa9b..f20530bb 100644 >>> --- a/drivers/gpu/drm/i915/i915_drv.h >>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>> @@ -1949,6 +1949,13 @@ struct drm_i915_private { >>> struct i915_suspend_saved_registers regfile; >>> struct vlv_s0ix_state vlv_s0ix_state; >>> >>> + enum { >>> + I915_SKL_SAGV_UNKNOWN = 0, >>> + I915_SKL_SAGV_DISABLED, >>> + I915_SKL_SAGV_ENABLED, >>> + I915_SKL_SAGV_NOT_CONTROLLED >>> + } skl_sagv_status; >>> + >>> struct { >>> /* >>> * Raw watermark latency values: >>> diff --git a/drivers/gpu/drm/i915/i915_reg.h >>> b/drivers/gpu/drm/i915/i915_reg.h >>> index 7419fbf..be82c49 100644 >>> --- a/drivers/gpu/drm/i915/i915_reg.h >>> +++ b/drivers/gpu/drm/i915/i915_reg.h >>> @@ -7153,6 +7153,10 @@ enum { >>> #define HSW_PCODE_DE_WRITE_FREQ_REQ 0x17 >>> #define DISPLAY_IPS_CONTROL 0x19 >>> #define HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL 0x1A >>> +#define GEN9_PCODE_SAGV_CONTROL 0x21 >>> +#define GEN9_SAGV_DISABLE 0x0 >>> +#define GEN9_SAGV_IS_DISABLED 0x1 >>> +#define GEN9_SAGV_ENABLE 0x3 >>> #define GEN6_PCODE_DATA _MMIO(0x138128) >>> #define GEN6_PCODE_FREQ_IA_RATIO_SHIFT 8 >>> #define GEN6_PCODE_FREQ_RING_RATIO_SHIFT 16 >>> diff --git a/drivers/gpu/drm/i915/intel_display.c >>> b/drivers/gpu/drm/i915/intel_display.c >>> index 781d22e..ca4b83f 100644 >>> --- a/drivers/gpu/drm/i915/intel_display.c >>> +++ b/drivers/gpu/drm/i915/intel_display.c >>> @@ -14156,6 +14156,13 @@ static void intel_atomic_commit_tail(struct >>> drm_atomic_state *state) >>> intel_state->cdclk_pll_vco != dev_priv- >>>> cdclk_pll.vco)) >>> dev_priv->display.modeset_commit_cdclk(state); >>> >>> + /* >>> + * SKL workaround: bspec recommends we disable the SAGV >>> when we >>> + * have more then one pipe enabled >>> + */ >>> + if (IS_SKYLAKE(dev_priv) && !skl_can_enable_sagv(state)) >>> + skl_disable_sagv(dev_priv); >>> + >>> intel_modeset_verify_disabled(dev); >>> } >>> >>> @@ -14229,6 +14236,10 @@ static void intel_atomic_commit_tail(struct >>> drm_atomic_state *state) >>> intel_modeset_verify_crtc(crtc, old_crtc_state, crtc- >>>> state); >>> } >>> >>> + if (IS_SKYLAKE(dev_priv) && intel_state->modeset && >>> + skl_can_enable_sagv(state)) >>> + skl_enable_sagv(dev_priv); >>> + >>> drm_atomic_helper_commit_hw_done(state); >>> >>> if (intel_state->modeset) >>> diff --git a/drivers/gpu/drm/i915/intel_drv.h >>> b/drivers/gpu/drm/i915/intel_drv.h >>> index 1c700b0..d203c77 100644 >>> --- a/drivers/gpu/drm/i915/intel_drv.h >>> +++ b/drivers/gpu/drm/i915/intel_drv.h >>> @@ -1722,6 +1722,9 @@ void ilk_wm_get_hw_state(struct drm_device *dev); >>> void skl_wm_get_hw_state(struct drm_device *dev); >>> void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv, >>> struct skl_ddb_allocation *ddb /* out */); >>> +bool skl_can_enable_sagv(struct drm_atomic_state *state); >>> +int skl_enable_sagv(struct drm_i915_private *dev_priv); >>> +int skl_disable_sagv(struct drm_i915_private *dev_priv); >>> uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config); >>> bool ilk_disable_lp_wm(struct drm_device *dev); >>> int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6); >>> diff --git a/drivers/gpu/drm/i915/intel_pm.c >>> b/drivers/gpu/drm/i915/intel_pm.c >>> index b4cf988..fed2bae8 100644 >>> --- a/drivers/gpu/drm/i915/intel_pm.c >>> +++ b/drivers/gpu/drm/i915/intel_pm.c >>> @@ -2860,6 +2860,7 @@ bool ilk_disable_lp_wm(struct drm_device *dev) >>> >>> #define SKL_DDB_SIZE 896 /* in blocks */ >>> #define BXT_DDB_SIZE 512 >>> +#define SKL_SAGV_BLOCK_TIME 30 /* µs */ >>> >>> /* >>> * Return the index of a plane in the SKL DDB and wm result >>> arrays. Primary >>> @@ -2883,6 +2884,153 @@ skl_wm_plane_id(const struct intel_plane *plane) >>> } >>> } >>> >>> +/* >>> + * SAGV dynamically adjusts the system agent voltage and clock frequencies >>> + * depending on power and performance requirements. The display engine >>> access >>> + * to system memory is blocked during the adjustment time. Because of the >>> + * blocking time, having this enabled can cause full system hangs and/or >>> pipe >>> + * underruns if we don't meet all of the following requirements: >>> + * >>> + * - <= 1 pipe enabled >>> + * - All planes can enable watermarks for latencies >= SAGV engine block >>> time >>> + * - We're not using an interlaced display configuration >>> + */ >>> +int >>> +skl_enable_sagv(struct drm_i915_private *dev_priv) >>> +{ >>> + int ret; >>> + >>> + if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED || >>> + dev_priv->skl_sagv_status == I915_SKL_SAGV_ENABLED) >>> + return 0; >>> + >>> + DRM_DEBUG_KMS("Enabling the SAGV\n"); >>> + mutex_lock(&dev_priv->rps.hw_lock); >>> + >>> + ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL, >>> + GEN9_SAGV_ENABLE); >>> + >>> + /* We don't need to wait for the SAGV when enabling */ >>> + mutex_unlock(&dev_priv->rps.hw_lock); >>> + >>> + /* >>> + * Some skl systems, pre-release machines in particular, >>> + * don't actually have an SAGV. >>> + */ >>> + if (ret == -ENOSYS) { >>> + DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n"); >>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED; >>> + return 0; >>> + } else if (ret < 0) { >>> + DRM_ERROR("Failed to enable the SAGV\n"); >>> + return ret; >>> + } >>> + >>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_ENABLED; >>> + return 0; >>> +} >>> + >>> +static int >>> +skl_do_sagv_disable(struct drm_i915_private *dev_priv) >>> +{ >>> + int ret; >>> + uint32_t temp = GEN9_SAGV_DISABLE; >>> + >>> + ret = sandybridge_pcode_read(dev_priv, GEN9_PCODE_SAGV_CONTROL, >>> + &temp); >>> + if (ret) >>> + return ret; >>> + else >>> + return temp & GEN9_SAGV_IS_DISABLED; >>> +} >>> + >>> +int >>> +skl_disable_sagv(struct drm_i915_private *dev_priv) >>> +{ >>> + int ret, result; >>> + >>> + if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED || >>> + dev_priv->skl_sagv_status == I915_SKL_SAGV_DISABLED) >>> + return 0; >>> + >>> + DRM_DEBUG_KMS("Disabling the SAGV\n"); >>> + mutex_lock(&dev_priv->rps.hw_lock); >>> + >>> + /* bspec says to keep retrying for at least 1 ms */ >>> + ret = wait_for(result = skl_do_sagv_disable(dev_priv), 1); >>> + mutex_unlock(&dev_priv->rps.hw_lock); >>> + >>> + if (ret == -ETIMEDOUT) { >>> + DRM_ERROR("Request to disable SAGV timed out\n"); >>> + return -ETIMEDOUT; >>> + } >>> + >>> + /* >>> + * Some skl systems, pre-release machines in particular, >>> + * don't actually have an SAGV. >>> + */ >>> + if (result == -ENOSYS) { >>> + DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n"); >>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED; >>> + return 0; >>> + } else if (result < 0) { >>> + DRM_ERROR("Failed to disable the SAGV\n"); >>> + return result; >>> + } >>> + >>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_DISABLED; >>> + return 0; >>> +} >>> + >>> +bool skl_can_enable_sagv(struct drm_atomic_state *state) >>> +{ >>> + struct drm_device *dev = state->dev; >>> + struct drm_i915_private *dev_priv = to_i915(dev); >>> + struct intel_atomic_state *intel_state = >>> to_intel_atomic_state(state); >>> + struct drm_crtc *crtc; >>> + enum pipe pipe; >>> + int level, plane; >>> + >>> + /* >>> + * SKL workaround: bspec recommends we disable the SAGV when we >>> have >>> + * more then one pipe enabled >>> + * >>> + * If there are no active CRTCs, no additional checks need be >>> performed >>> + */ >>> + if (hweight32(intel_state->active_crtcs) == 0) >>> + return true; >>> + else if (hweight32(intel_state->active_crtcs) > 1) >>> + return false; >>> + >>> + /* Since we're now guaranteed to only have one active CRTC... */ >>> + pipe = ffs(intel_state->active_crtcs) - 1; >>> + crtc = dev_priv->pipe_to_crtc_mapping[pipe]; >>> + >>> + if (crtc->state->mode.flags & DRM_MODE_FLAG_INTERLACE) >>> + return false; >>> + >>> + for_each_plane(dev_priv, pipe, plane) { >>> + /* Skip this plane if it's not enabled */ >>> + if (intel_state->wm_results.plane[pipe][plane][0] == 0) >>> + continue; >>> + >>> + /* Find the highest enabled wm level for this plane */ >>> + for (level = ilk_wm_max_level(dev); >>> + intel_state->wm_results.plane[pipe][plane][level] == >>> 0; >>> + --level); >>> + >>> + /* >>> + * If any of the planes on this pipe don't enable wm levels >>> + * that incur memory latencies higher then 30µs we can't >>> enable >>> + * the SAGV >>> + */ >>> + if (dev_priv->wm.skl_latency[level] < SKL_SAGV_BLOCK_TIME) >>> + return false; >> Shouldn't this check be >= BLOCK_TIME? >> > That's the requirement for the sagv but the conditional here is still correct. > > WM0 - 2ms > WM1 - 5ms > WM2 - 10ms > WM3 - 20ms > WM4+- disabled > > (20ms < BLOCK_TIME) == true, which indicates we don't have any watermark levels > with latency values >= 30ms. We can't enable so return false. > > WM0 - 2ms > WM1 - 5ms > WM2 - 10ms > WM3 - 20ms > WM4 - 33ms > WM5 - 50ms > WM6 - 70ms > WM7 - 99ms > > (99ms < BLOCK_TIME) == false, and 99 >= BLOCK_TIME so we end up returning true > to indicate it's safe to enable. Ah right, thanks for explaining. ~Maarten -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html