Re: [PATCH 0/2] Objpool performance improvements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 26, 2024 at 7:25 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> Hi Andrii,
>
> On Wed, 24 Apr 2024 14:52:12 -0700
> Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
>
> > Improve objpool (used heavily in kretprobe hot path) performance with two
> > improvements:
> >   - inlining performance critical objpool_push()/objpool_pop() operations;
> >   - avoiding re-calculating relatively expensive nr_possible_cpus().
>
> Thanks for optimizing objpool. Both looks good to me.

Great, thanks for applying.

>
> BTW, I don't intend to stop this short-term optimization attempts,
> but I would like to ask you check the new fgraph based fprobe
> (kretprobe-multi)[1] instead of objpool/rethook.

You can see that I did :) There is tons of code and I'm not familiar
with internals of function_graph infra, but you can see I did run
benchmarks, so I'm paying attention.

But as for the objpool itself, I think it's a performant and useful
internal building block, and we might use it outside of rethook as
well, so I think making it as fast as possible is good regardless.

>
> [1] https://lore.kernel.org/all/171318533841.254850.15841395205784342850.stgit@devnote2/
>
> I'm considering to obsolete the kretprobe (and rethook) with fprobe
> and eventually remove it. Those have similar feature and we should
> choose safer one.
>

Yep, I had a few more semi-ready patches, but I'll hold off for now
given this move to function graph, plus some of the changes that Jiri
is making in multi-kprobe code. I'll wait for things to settle down a
bit before looking again.

But just to give you some context, I'm an author of retsnoop tool, and
one of the killer features of it is LBR capture in kretprobes, which
is a tremendous help in investigating kernel failures, especially in
unfamiliar code (LBR allows to "look back" and figure out "how did we
get to this condition" after the fact). And so it's important to
minimize the amount of wasted LBR records between some kernel function
returns error (and thus is "an interesting event" and BPF program that
captures LBR is triggered). Big part of that is ftrace/fprobe/rethook
infra, so I was looking into making that part as "minimal" as
possible, in the sense of eliminating as many function calls and jump
as possible. This has an added benefit of making this hot path faster,
but my main motivation is LBR.

Anyways, just a bit of context for some of the other patches (like
inlining arch_rethook_trampoline_callback).

> Thank you,
>
> >
> > These opportunities were found when benchmarking and profiling kprobes and
> > kretprobes with BPF-based benchmarks. See individual patches for details and
> > results.
> >
> > Andrii Nakryiko (2):
> >   objpool: enable inlining objpool_push() and objpool_pop() operations
> >   objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
> >
> >  include/linux/objpool.h | 105 +++++++++++++++++++++++++++++++++++--
> >  lib/objpool.c           | 112 +++-------------------------------------
> >  2 files changed, 107 insertions(+), 110 deletions(-)
> >
> > --
> > 2.43.0
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux