On Wed, 2017-08-23 at 20:22 +0200, Peter Zijlstra wrote: > On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote: > > > Anyhow, returning to the metrics i915 exposes. Some metrics are just > > exposure of some counters supported already inside i915 PMU which do not > > require any special sampling: at any given moment you can request the > > counter value (these are interrupts counts, i915 power consumption). > > > Other metrics are similar to the ever-existing which I just described, > > but they require activation for i915 to start to count them - this is > > done on the event initialization (these are engine busy stats). > > Right, so depending on how expensive this activation is and if it can be > done without scheduling, there are two options: > > 1) activate/deactivate from pmu::start()/pmu::stop() > 2) activate/deactivate from pmu::event_init()/event->destroy() and > disregard all counting between pmu::stop() and pmu::start(). > > > Finally, there is a third group which require sampling counting: they > > are needed to be initialized and i915 pmu starts an internal timer to > > count these values (these are some engines characteristics referenced > > in the code as QUEUED, SEMA, WAIT). > > So uncore PMUs can't really do sampling. That is, perf defines sampling > as interrupting the relevant task and then providing things like the > %RIP value at interrupt time. Since uncore activity cannot be associated > with any one task, no sampling allowed. > > Now, I'm thinking that what i915 does is slightly different, it doesn't > provide registers to read out the counter state, but instead > periodically writes state snapshots into some memory buffer, right? > > That's a bit tricky, maybe the best fit would be what PPC HV 24x7 does. > They create an event-group, that is a set of counters that are > co-scheduled, matching the set of counters they get from the HV > interface (or a subset) and then sys_read() will use a TXN_READ to > group-read the entire thing at once. In your case it could consume the > last state snapshot instead of request one (or wait for the next, > whatever works best). > > Would that work? Hi Peter, I have updated my fixes to Tvrtko's PMU, they are here: https://patchwork.freedesktop.org/series/28842/, and I started to check whether we will be able to cover all the use cases for this PMU which we had in mind. Here I have some concerns and further questions. So, as soon as I registered PMU with the perf_invalid_context, i.e. as an uncore PMU, I got the effect that metrics from our PMU are available under root only. This happens since we fall to the following case described in 'man perf_event_open': "A pid == -1 and cpu >= 0 setting is per-CPU and measures all processes on the specified CPU. Per-CPU events need the CAP_SYS_ADMIN capability or a /proc/sys/kernel/perf_event_paranoid value of less than 1." This a trouble point for us... So, could you, please, clarify: 1. How PMU API is positioned? It is for debug purposes only or it can be used in the end-user release applications to monitor system activity and make some decisions based on that? 2. How applications can access uncore PMU metrics from non-privileged applications? 3. Is that a strong requirement to restrict uncore PMU metrics reporting to privileged applications or this can be relaxed? I understand why restriction was relevant in the time when only CPU level were available: system-wide were expensive, but I don't quite understand why these restrictions are in place now for uncore PMUs when they actually report metrics right away. Is that just a remnant of CPU-only times and no one needed this to be changed? Can this be changed and uncore metrics allowed to be accessed from general applications? Regards, Dmitry. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx