On Thu, Apr 21, 2016 at 12:16 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
On Wed, Apr 20, 2016 at 03:23:10PM +0100, Robert Bragg wrote:
You really want to busy spin for 100ms? msleep() perhaps!> +static int hsw_enable_metric_set(struct drm_i915_private *dev_priv)
> +{
> + int ret = i915_oa_select_metric_set_hsw(dev_priv);
> +
> + if (ret)
> + return ret;
> +
> + I915_WRITE(GDT_CHICKEN_BITS, GT_NOA_ENABLE);
> +
> + /* PRM:
> + *
> + * OA unit is using “crclk” for its functionality. When trunk
> + * level clock gating takes place, OA clock would be gated,
> + * unable to count the events from non-render clock domain.
> + * Render clock gating must be disabled when OA is enabled to
> + * count the events from non-render domain. Unit level clock
> + * gating for RCS should also be disabled.
> + */
> + I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) &
> + ~GEN7_DOP_CLOCK_GATE_ENABLE));
> + I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) |
> + GEN6_CSUNIT_CLOCK_GATE_DISABLE));
> +
> + config_oa_regs(dev_priv, dev_priv->perf.oa.mux_regs,
> + dev_priv->perf.oa.mux_regs_len);
> +
> + /* It takes a fairly long time for a new MUX configuration to
> + * be be applied after these register writes. This delay
> + * duration was derived empirically based on the render_basic
> + * config but hopefully it covers the maximum configuration
> + * latency...
> + */
> + mdelay(100);
Ah, oops, I forgot to change this, thanks!
Did you look for some register you can observe the change in when the
mux is reconfigured? Is even reading one of the OA registers enough?
Although I can't really comprehend why the delay apparently needs to be quite so long, based on my limited understanding of some of the NOA michroarchitecture involved here it makes some sense to me there would be a delay that's also somewhat variable depending on the particular MUX config and I don't know of a trick for getting explicit feedback of completion unfortunately.
I did bring this up briefly, recently in discussion with others more familiar with the HW side of things, but haven't had much feedback on this so far. afaik other OS drivers aren't currently accounting for a need to have a delay here.
For reference, 100ms was picked as I was experimenting with stepping up the delay by orders of magnitude and found 10ms wasn't enough. Potentially I could experiment further with delays between 10 and 100ms, but I suppose it won't make a big difference.
> + config_oa_regs(dev_priv, dev_priv->perf.oa.b_counter_regs,
> + dev_priv->perf.oa.b_counter_regs_len);
> +
> + return 0;
> +}
> +
> +static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
> +{
> + I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) &
> + ~GEN6_CSUNIT_CLOCK_GATE_DISABLE));
> + I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) |
> + GEN7_DOP_CLOCK_GATE_ENABLE));
> +
> + I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) &
> + ~GT_NOA_ENABLE));
You didn't preserve any other chicken bits during enable_metric_set.
Hmm, good point. I think I'll aim to preserve other bits when setting if that works, just in case something else needs to fiddle with the same register later.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx