Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 29 Apr 2024 13:28:44 -0700

On Sun, Apr 28, 2024 at 4:25 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Thu, 25 Apr 2024 13:31:53 -0700
> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>
> I'm just coming back from Japan (work and then a vacation), and
> catching up on my email during the 6 hour layover in Detroit.
>
> > Hey Masami,
> >
> > I can't really review most of that code as I'm completely unfamiliar
> > with all those inner workings of fprobe/ftrace/function_graph. I left
> > a few comments where there were somewhat more obvious BPF-related
> > pieces.
> >
> > But I also did run our BPF benchmarks on probes/for-next as a baseline
> > and then with your series applied on top. Just to see if there are any
> > regressions. I think it will be a useful data point for you.
> >
> > You should be already familiar with the bench tool we have in BPF
> > selftests (I used it on some other patches for your tree).
>
> I should get familiar with your tools too.
>

It's a nifty and self-contained tool to do some micro-benchmarking, I
replied to Masami with a few details on how to build and use it.

> >
> > BASELINE
> > ========
> > kprobe         :   24.634 ± 0.205M/s
> > kprobe-multi   :   28.898 ± 0.531M/s
> > kretprobe      :   10.478 ± 0.015M/s
> > kretprobe-multi:   11.012 ± 0.063M/s
> >
> > THIS PATCH SET ON TOP
> > =====================
> > kprobe         :   25.144 ± 0.027M/s (+2%)
> > kprobe-multi   :   28.909 ± 0.074M/s
> > kretprobe      :    9.482 ± 0.008M/s (-9.5%)
> > kretprobe-multi:   13.688 ± 0.027M/s (+24%)
> >
> > These numbers are pretty stable and look to be more or less representative.
>
> Thanks for running this.
>
> >
> > As you can see, kprobes got a bit faster, kprobe-multi seems to be
> > about the same, though.
> >
> > Then (I suppose they are "legacy") kretprobes got quite noticeably
> > slower, almost by 10%. Not sure why, but looks real after re-running
> > benchmarks a bunch of times and getting stable results.
> >
> > On the other hand, multi-kretprobes got significantly faster (+24%!).
> > Again, I don't know if it is expected or not, but it's a nice
> > improvement.
> >
> > If you have any idea why kretprobes would get so much slower, it would
> > be nice to look into that and see if you can mitigate the regression
> > somehow. Thanks!
>
> My guess is that this patch set helps generic use cases for tracing the
> return of functions, but will likely add more overhead for single use
> cases. That is, kretprobe is made to be specific for a single function,
> but kretprobe-multi is more generic. Hence the generic version will
> improve at the sacrifice of the specific function. I did expect as much.
>
> That said, I think there's probably a lot of low hanging fruit that can
> be done to this series to help improve the kretprobe performance. I'm
> not sure we can get back to the baseline, but I'm hoping we can at
> least make it much better than that 10% slowdown.

That would certainly be appreciated, thanks!

But I'm also considering trying to switch to multi-kprobe/kretprobe
automatically on libbpf side, whenever possible, so that users can get
the best performance. There might still be situations where this can't
be done, so singular kprobe/kretprobe can't be completely deprecated,
but multi variants seems to be universally faster, so I'm going to
make them a default (I need to handle some backwards compat aspect,
but that's libbpf-specific stuff you shouldn't be concerned with).

>
> I'll be reviewing this patch set this week as I recover from jetlag.
>
> -- Steve