On Mon, Aug 13, 2018 at 09:52:41PM +0200, Duy Nguyen wrote: > I don't think I have really fully mastered 'perf'. In this case for > example, I don't think the default event 'cycles' is the right one > because we are hit hard by I/O as well. I think at least I now have an > excuse to try that famous flamegraph out ;-) but if you have time to > run a quick analysis of this unpack-trees with 'perf', I'd love to > learn a trick or two from you. To be honest, I don't feel like I know how to use perf either. ;) But I'll try to contribute what I know. Usually I'd just use perf to get a callgraph with hot-spots, like: perf record -g git ... perf report --call-graph=fractal,0.05,caller But that's not going to show you absolute times, which makes it lousy for comparing run-to-run (if you speed something up, its percentage gets smaller, but it's hard to tell _how much_ you've sped it up). And as you note, it's measuring CPU cycles, not wall-clock. To get output most similar to what you've shown, I think you'd define some probes at functions of interest: for i in unpack_trees cache_tree_update; do # Cover both function entrance and return. perf probe -x $(which git) $i perf probe -x $(which git) ${i}%return done and then record a run looking for those events: perf record -e 'probe_git:*' git ... and then dump the result: perf script -F time,event which gives you the times for each event. If you want elapsed times, you have to compute them yourself: perf script -F time,event | perl -ne ' /([0-9.]+):\s+probe_git:(.*):/ or die "confusing: $_"; my ($t, $func) = ($1, $2); if ($func =~ s/__return$//) { my $start = pop @stack; printf "%0.9f", $t - $start; print " s: "; print " " for (0..@stack-1); print $func, "\n"; } else { push @stack, $t; } ' which gives a similar inverted-graph elapsed-time output that your trace output does. One annoying downside is that you have to be root to create or use the dynamic probes. I don't know if there's an easy way around that. Or if there's a perf command which already handles this kind of elapsed stuff (there's a "perf trace" which seems really close, but I couldn't convince it to look at elapsed time for non-syscalls). > > I can buy the argument that it's nice to have some form of profiling > > that works everywhere, even if it's lowest-common-denominator. I just > > wonder if we could be investing effort into tooling around existing > > solutions that will end up more powerful and flexible in the long run. > > I think part of this sprinkling is to highlight the performance > sensitive spots in the code. And it would be helpful to ask a user to > enable GIT_TRACE_PERFORMANCE to have a quick breakdown when something > is reported slow. I don't care that much about other platforms to be > honest, but perf being largely restricted to root does prevent it from > replacing GIT_TRACE_PERFORMANCE in this case. Yeah, this line of reasoning (which is similar to what Stefan said) is compelling to me. GIT_TRACE_* is _most_ useful when we can ask ordinary users to give us output. Even if we scripted the complexity I showed above, it's not guaranteed that perf is even available or that the user has permissions to use it. -Peff