Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 29 Apr 2024 13:25:04 -0700

On Mon, Apr 29, 2024 at 6:51 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> Hi Andrii,
>
> On Thu, 25 Apr 2024 13:31:53 -0700
> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>
> > Hey Masami,
> >
> > I can't really review most of that code as I'm completely unfamiliar
> > with all those inner workings of fprobe/ftrace/function_graph. I left
> > a few comments where there were somewhat more obvious BPF-related
> > pieces.
> >
> > But I also did run our BPF benchmarks on probes/for-next as a baseline
> > and then with your series applied on top. Just to see if there are any
> > regressions. I think it will be a useful data point for you.
>
> Thanks for testing!
>
> >
> > You should be already familiar with the bench tool we have in BPF
> > selftests (I used it on some other patches for your tree).
>
> What patches we need?
>

You mean for this `bench` tool? They are part of BPF selftests (under
tools/testing/selftests/bpf), you can build them by running:

$ make RELEASE=1 -j$(nproc) bench

After that you'll get a self-container `bench` binary, which has all
the self-contained benchmarks.

You might also find a small script (benchs/run_bench_trigger.sh inside
BPF selftests directory) helpful, it collects final summary of the
benchmark run and optionally accepts a specific set of benchmarks. So
you can use it like this:

$ benchs/run_bench_trigger.sh kprobe kprobe-multi
kprobe         :   18.731 ± 0.639M/s
kprobe-multi   :   23.938 ± 0.612M/s

By default it will run a wider set of benchmarks (no uprobes, but a
bunch of extra fentry/fexit tests and stuff like this).

> >
> > BASELINE
> > ========
> > kprobe         :   24.634 ± 0.205M/s
> > kprobe-multi   :   28.898 ± 0.531M/s
> > kretprobe      :   10.478 ± 0.015M/s
> > kretprobe-multi:   11.012 ± 0.063M/s
> >
> > THIS PATCH SET ON TOP
> > =====================
> > kprobe         :   25.144 ± 0.027M/s (+2%)
> > kprobe-multi   :   28.909 ± 0.074M/s
> > kretprobe      :    9.482 ± 0.008M/s (-9.5%)
> > kretprobe-multi:   13.688 ± 0.027M/s (+24%)
>
> This looks good. Kretprobe should also use kretprobe-multi (fprobe)
> eventually because it should be a single callback version of
> kretprobe-multi.
>
> >
> > These numbers are pretty stable and look to be more or less representative.
> >
> > As you can see, kprobes got a bit faster, kprobe-multi seems to be
> > about the same, though.
> >
> > Then (I suppose they are "legacy") kretprobes got quite noticeably
> > slower, almost by 10%. Not sure why, but looks real after re-running
> > benchmarks a bunch of times and getting stable results.
>
> Hmm, kretprobe on x86 should use ftrace + rethook even with my series.
> So nothing should be changed. Maybe cache access pattern has been
> changed?
> I'll check it with tracefs (to remove the effect from bpf related changes)
>
> >
> > On the other hand, multi-kretprobes got significantly faster (+24%!).
> > Again, I don't know if it is expected or not, but it's a nice
> > improvement.
>
> Thanks!
>
> >
> > If you have any idea why kretprobes would get so much slower, it would
> > be nice to look into that and see if you can mitigate the regression
> > somehow. Thanks!
>
> OK, let me check it.
>
> Thank you!
>
> >
> >
> > >  51 files changed, 2325 insertions(+), 882 deletions(-)
> > >  create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> > >
> > > --
> > > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>
> > >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>