Re: [PATCH v4 2/5] unpack-trees: add performance tracing

Jeff King <peff@xxxxxxxx> · Mon, 13 Aug 2018 17:47:03 -0400

On Mon, Aug 13, 2018 at 09:52:41PM +0200, Duy Nguyen wrote:

> I don't think I have really fully mastered 'perf'. In this case for
> example, I don't think the default event 'cycles' is the right one
> because we are hit hard by I/O as well. I think at least I now have an
> excuse to try that famous flamegraph out ;-) but if you have time to
> run a quick analysis of this unpack-trees with 'perf', I'd love to
> learn a trick or two from you.

To be honest, I don't feel like I know how to use perf either. ;) But
I'll try to contribute what I know.

Usually I'd just use perf to get a callgraph with hot-spots, like:

  perf record -g git ...
  perf report --call-graph=fractal,0.05,caller

But that's not going to show you absolute times, which makes it lousy
for comparing run-to-run (if you speed something up, its percentage gets
smaller, but it's hard to tell _how much_ you've sped it up). And as you
note, it's measuring CPU cycles, not wall-clock.

To get output most similar to what you've shown, I think you'd define
some probes at functions of interest:

  for i in unpack_trees cache_tree_update; do
    # Cover both function entrance and return.
    perf probe -x $(which git) $i
    perf probe -x $(which git) ${i}%return
  done

and then record a run looking for those events:

  perf record -e 'probe_git:*' git ...

and then dump the result:

  perf script -F time,event

which gives you the times for each event. If you want elapsed times, you
have to compute them yourself:

  perf script -F time,event |
  perl -ne '
    /([0-9.]+):\s+probe_git:(.*):/ or die "confusing: $_";
    my ($t, $func) = ($1, $2);
    if ($func =~ s/__return$//) {
      my $start = pop @stack;
      printf "%0.9f", $t - $start;
      print " s: ";
      print "  " for (0..@stack-1);
      print $func, "\n";
    } else {
      push @stack, $t;
    }
  '

which gives a similar inverted-graph elapsed-time output that your trace
output does. One annoying downside is that you have to be root to create
or use the dynamic probes. I don't know if there's an easy way around
that. Or if there's a perf command which already handles this kind of
elapsed stuff (there's a "perf trace" which seems really close, but I
couldn't convince it to look at elapsed time for non-syscalls).

> > I can buy the argument that it's nice to have some form of profiling
> > that works everywhere, even if it's lowest-common-denominator. I just
> > wonder if we could be investing effort into tooling around existing
> > solutions that will end up more powerful and flexible in the long run.
> 
> I think part of this sprinkling is to highlight the performance
> sensitive spots in the code. And it would be helpful to ask a user to
> enable GIT_TRACE_PERFORMANCE to have a quick breakdown when something
> is reported slow. I don't care that much about other platforms to be
> honest, but perf being largely restricted to root does prevent it from
> replacing GIT_TRACE_PERFORMANCE in this case.

Yeah, this line of reasoning (which is similar to what Stefan said) is
compelling to me. GIT_TRACE_* is _most_ useful when we can ask ordinary
users to give us output. Even if we scripted the complexity I showed
above, it's not guaranteed that perf is even available or that the user
has permissions to use it.

-Peff