Quoting Tvrtko Ursulin (2017-09-29 13:34:52) > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > From: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx> > > The first goal is to be able to measure GPU (and invidual ring) busyness > without having to poll registers from userspace. (Which not only incurs > holding the forcewake lock indefinitely, perturbing the system, but also > runs the risk of hanging the machine.) As an alternative we can use the > perf event counter interface to sample the ring registers periodically > and send those results to userspace. > > Functionality we are exporting to userspace is via the existing perf PMU > API and can be exercised via the existing tools. For example: > > perf stat -a -e i915/rcs0-busy/ -I 1000 > > Will print the render engine busynnes once per second. All the performance > counters can be enumerated (perf list) and have their unit of measure > correctly reported in sysfs. > > v1-v2 (Chris Wilson): > > v2: Use a common timer for the ring sampling. > > v3: (Tvrtko Ursulin) > * Decouple uAPI from i915 engine ids. > * Complete uAPI defines. > * Refactor some code to helpers for clarity. > * Skip sampling disabled engines. > * Expose counters in sysfs. > * Pass in fake regs to avoid null ptr deref in perf core. > * Convert to class/instance uAPI. > * Use shared driver code for rc6 residency, power and frequency. > > v4: (Dmitry Rogozhkin) > * Register PMU with .task_ctx_nr=perf_invalid_context > * Expose cpumask for the PMU with the single CPU in the mask > * Properly support pmu->stop(): it should call pmu->read() > * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE) > * Introduce refcounting of event subscriptions. > * Make pmu.busy_stats a refcounter to avoid busy stats going away > with some deleted event. > * Expose cpumask for i915 PMU to avoid multiple events creation of > the same type followed by counter aggregation by perf-stat. > * Track CPUs getting online/offline to migrate perf context. If (likely) > cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be > needed to see effect of CPU status tracking. > * End result is that only global events are supported and perf stat > works correctly. > * Deny perf driver level sampling - it is prohibited for uncore PMU. > > v5: (Tvrtko Ursulin) > > * Don't hardcode number of engine samplers. > * Rewrite event ref-counting for correctness and simplicity. > * Store initial counter value when starting already enabled events > to correctly report values to all listeners. > * Fix RC6 residency readout. > * Comments, GPL header. > > v6: > * Add missing entry to v4 changelog. > * Fix accounting in CPU hotplug case by copying the approach from > arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin) > > v7: > * Log failure message only on failure. > * Remove CPU hotplug notification state on unregister. > > v8: > * Fix error unwind on failed registration. > * Checkpatch cleanup. > > v9: > * Drop the energy metric, it is available via intel_rapl_perf. > (Ville Syrjälä) > * Use HAS_RC6(p). (Chris Wilson) > * Handle unsupported non-engine events. (Dmitry Rogozhkin) > * Rebase for intel_rc6_residency_ns needing caller managed > runtime pm. > * Drop HAS_RC6 checks from the read callback since creating those > events will be rejected at init time already. > * Add counter units to sysfs so perf stat output is nicer. > * Cleanup the attribute tables for brevity and readability. > > v10: > * Fixed queued accounting. > > v11: > * Move intel_engine_lookup_user to intel_engine_cs.c > * Commit update. (Joonas Lahtinen) > > v12: > * More accurate sampling. (Chris Wilson) > * Store and report frequency in MHz for better usability from > perf stat. > * Removed metrics: queued, interrupts, rc6 counters. > * Sample engine busyness based on seqno difference only > for less MMIO (and forcewake) on all platforms. (Chris Wilson) > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > --- > +static void > +update_sample(struct i915_pmu_sample *sample, u32 unit, u32 val) > +{ > + /* > + * Since we are doing stohastical sampling for these counter, stochastic > + * average the delta with the previous value for better accuracy. > + */ > + sample->cur += div_u64((u64)(sample->prev + val) * unit, 2); div_u64(mul_32_32(sample->prev + val, unit), 2) > + sample->prev = val; > +} > + > +static void engines_sample(struct drm_i915_private *dev_priv) > +{ > + struct intel_engine_cs *engine; > + enum intel_engine_id id; > + bool fw = false; > + > + if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0) > + return; > + > + if (!dev_priv->gt.awake) > + return; > + > + if (!intel_runtime_pm_get_if_in_use(dev_priv)) > + return; > + > + for_each_engine(engine, dev_priv, id) { > + u32 current_seqno = intel_engine_get_seqno(engine); > + u32 last_seqno = intel_engine_last_submit(engine); > + u32 val; > + > + val = !i915_seqno_passed(current_seqno, last_seqno); > + > + update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY], PERIOD, > + val); Personally I think update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY], PERIOD, val); carries better visual weighting. > + > + if (val && (engine->pmu.enable & > + (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) { > + fw = grab_forcewake(dev_priv, fw); > + > + val = I915_READ_FW(RING_CTL(engine->mmio_base)); > + } else { > + val = 0; > + } > + > + update_sample(&engine->pmu.sample[I915_SAMPLE_WAIT], PERIOD, > + !!(val & RING_WAIT)); > + > + update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], PERIOD, > + !!(val & RING_WAIT_SEMAPHORE)); > + } > + > + if (fw) > + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); > + > + intel_runtime_pm_put(dev_priv); Ok. > +} > + > +static void frequency_sample(struct drm_i915_private *dev_priv) > +{ > + if (dev_priv->pmu.enable & > + config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) { > + u32 val; > + > + val = dev_priv->rps.cur_freq; > + if (dev_priv->gt.awake && > + intel_runtime_pm_get_if_in_use(dev_priv)) { > + val = intel_get_cagf(dev_priv, > + I915_READ_NOTRACE(GEN6_RPSTAT1)); > + intel_runtime_pm_put(dev_priv); > + } > + > + update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT], 1, > + intel_gpu_freq(dev_priv, val)); > + } > + > + if (dev_priv->pmu.enable & > + config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) { > + update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], 1, > + intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq)); > + } Ok. > +} Looks good as the introductory set of counters. I trust the improvements to the perf_event integration are good (as you can tell from the earlier patch, I have no idea what I'm doing there ;) Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx