Quoting Tvrtko Ursulin (2018-06-06 15:40:10) > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > We add a PMU counter to expose the number of requests currently executing > on the GPU. > > This is useful to analyze the overall load of the system. > > v2: > * Rebase. > * Drop floating point constant. (Chris Wilson) > > v3: > * Change scale to 1024 for faster arithmetics. (Chris Wilson) > > v4: > * Refactored for timer period accounting. > > v5: > * Avoid 64-division. (Chris Wilson) > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > --- > #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) > > @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) > div_u64((u64)period_ns * > I915_SAMPLE_QUEUED_DIVISOR, > 1000000)); > + > + if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING)) > + add_sample_mult(&engine->pmu.sample[I915_SAMPLE_RUNNING], > + last_seqno - current_seqno, > + div_u64((u64)period_ns * > + I915_SAMPLE_QUEUED_DIVISOR, > + 1000000)); Are we worried about losing precision with qd.ns? add_sample_mult(SAMPLE, x, period_ns); here > @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event) > val = engine->pmu.sample[sample].cur; > > if (sample == I915_SAMPLE_QUEUED || > - sample == I915_SAMPLE_RUNNABLE) > + sample == I915_SAMPLE_RUNNABLE || > + sample == I915_SAMPLE_RUNNING) > val = div_u64(val, MSEC_PER_SEC); /* to qd */ and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC); So that gives us a limit of ~1 million qd (assuming the user cares for about 1s intervals). Up to 8 million wlog with val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8); Anyway, just concerned to have more than one 64b division and want to provoke you into thinking of a way of avoiding it :) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx