On Sat, Aug 12, 2017 at 02:15:13AM +0000, Rogozhkin, Dmitry V wrote: > $ perf stat -e instructions,i915/rcs0-busy/ workload.sh > <... wrokload.sh output...> > > Performance counter stats for 'workload.sh': > 1,204,616,268 instructions > 0 i915/rcs0-busy/ > > 1.869169153 seconds time elapsed > > As you can see instructions event works pretty well, i915/rcs0-busy/ > doesn't. > > I afraid that our current understanding of how PMU should work is not > fully correct. Can we start off by explaining to me how this i915 stuff works. Because all I have is ~750 lines of patch without comments. Which sort of leaves me confused. The above command tries to add an event 'i915/rcs0-busy/' to a task. How are i915 resource associated to any one particular task? Is there a unique i915 resource for each task? If not, I don't see how per-task event can ever work as expected. > I think so, because the way PMU entry points init(), > add(), del(), start(), stop(), read() are implemented do not correlate > with how many times they are called. I have counted them and here is the > result: > init()=19, add()=44310, del()=43900, start()=44534, stop()=0, read()=0 > > Which means that we are regularly attempt to start/stop timer and/or > busy stats calculations. Another thing which pay attention is that > read() was not called at all. How perf supposes to get counter value? Both stop() and del() are supposed to update event->count. Only if we do sys_read() while the event is active (something perf-stat never does IIRC) will it issue pmu::read() to get an up-to-date number. > Yet another thing, where we are supposed to initialize our internal > staff: numbers above are from single run and even init is called > multiple times? Where we are supposed to de-init our staff: each time on > del() - this hardly makes sense? init happens in pmu::event_init(), that can set an optional event->destroy() function for de-init. init() is called once for each event created, the above creates an inherited per-task event (I think, I lost track of what perf tool does) and 19 seems to suggest you did some 18 fork()/clone() calls after that, resulting in your 1 parent event with 18 children. > I should note that if perf will be issued with -I 10 option, then read() > is being called: init_c()=265, add_c()=132726, del_c()=131482, > start_c()=133412, stop()=0, read()=71. However, i915 counter is still 0. > I have tried to print counter values from within read() and these values > are non 0. Actually read() returns sequence of <non_zero>, 0, 0, 0, ..., > <no_zero> because with our add(), del() code we regularly start/stop our > counter and execution in read() follows different branches. > > Thus, I think that right now we do not implement PMU correctly and do > not meet perf expectations from the PMU. Unfortunately, right now I have > no idea what are these expectations. Please as to clarify how i915 works, I have no idea where to go. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx