Re: [PATCH 6/7] drm/i915/pmu: Add running counter

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 06 Jun 2018 16:23:55 +0100

Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
> From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> v4:
>  * Refactored for timer period accounting.
> 
> v5:
>  * Avoid 64-division. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> ---
>  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>  
> @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>                                         div_u64((u64)period_ns *
>                                                 I915_SAMPLE_QUEUED_DIVISOR,
>                                                 1000000));
> +
> +               if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
> +                       add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
> +                                       last_seqno - current_seqno,
> +                                       div_u64((u64)period_ns *
> +                                               I915_SAMPLE_QUEUED_DIVISOR,
> +                                               1000000));

Are we worried about losing precision with qd.ns?

add_sample_mult(SAMPLE, x, period_ns); here

> @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>                         val = engine->pmu.sample[sample].cur;
>  
>                         if (sample == I915_SAMPLE_QUEUED ||
> -                           sample == I915_SAMPLE_RUNNABLE)
> +                           sample == I915_SAMPLE_RUNNABLE ||
> +                           sample == I915_SAMPLE_RUNNING)
>                                 val = div_u64(val, MSEC_PER_SEC);  /* to qd */

and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);

So that gives us a limit of ~1 million qd (assuming the user cares for
about 1s intervals). Up to 8 million wlog with

	val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);

Anyway, just concerned to have more than one 64b division and want to
provoke you into thinking of a way of avoiding it :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx