Re: [PATCH 6/7] drm/i915/pmu: Add running counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 06/06/2018 16:23, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
  * Rebase.
  * Drop floating point constant. (Chris Wilson)

v3:
  * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
  * Refactored for timer period accounting.

v5:
  * Avoid 64-division. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
@@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
                                         div_u64((u64)period_ns *
                                                 I915_SAMPLE_QUEUED_DIVISOR,
                                                 1000000));
+
+               if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+                       add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING],
+                                       last_seqno - current_seqno,
+                                       div_u64((u64)period_ns *
+                                               I915_SAMPLE_QUEUED_DIVISOR,
+                                               1000000));

Are we worried about losing precision with qd.ns?

add_sample_mult(SAMPLE, x, period_ns); here

@@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
                         val = engine->pmu.sample[sample].cur;
if (sample == I915_SAMPLE_QUEUED ||
-                           sample == I915_SAMPLE_RUNNABLE)
+                           sample == I915_SAMPLE_RUNNABLE ||
+                           sample == I915_SAMPLE_RUNNING)
                                 val = div_u64(val, MSEC_PER_SEC);  /* to qd */

and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);

Yeah that works, thanks.

So that gives us a limit of ~1 million qd (assuming the user cares for
about 1s intervals). Up to 8 million wlog with

	val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);

Or keep in qd.us as for frequency. I think precision is plenty in any case.

Anyway, just concerned to have more than one 64b division and want to
provoke you into thinking of a way of avoiding it :)

It is an optimized 64-bit divide, or 64-divide as I faltered in the commit message :), so not as bad as 64/64, but still your idea is very good.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux