On Fri, Apr 26, 2024 at 7:25 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote: > > Hi Andrii, > > On Wed, 24 Apr 2024 14:52:12 -0700 > Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > > Improve objpool (used heavily in kretprobe hot path) performance with two > > improvements: > > - inlining performance critical objpool_push()/objpool_pop() operations; > > - avoiding re-calculating relatively expensive nr_possible_cpus(). > > Thanks for optimizing objpool. Both looks good to me. Great, thanks for applying. > > BTW, I don't intend to stop this short-term optimization attempts, > but I would like to ask you check the new fgraph based fprobe > (kretprobe-multi)[1] instead of objpool/rethook. You can see that I did :) There is tons of code and I'm not familiar with internals of function_graph infra, but you can see I did run benchmarks, so I'm paying attention. But as for the objpool itself, I think it's a performant and useful internal building block, and we might use it outside of rethook as well, so I think making it as fast as possible is good regardless. > > [1] https://lore.kernel.org/all/171318533841.254850.15841395205784342850.stgit@devnote2/ > > I'm considering to obsolete the kretprobe (and rethook) with fprobe > and eventually remove it. Those have similar feature and we should > choose safer one. > Yep, I had a few more semi-ready patches, but I'll hold off for now given this move to function graph, plus some of the changes that Jiri is making in multi-kprobe code. I'll wait for things to settle down a bit before looking again. But just to give you some context, I'm an author of retsnoop tool, and one of the killer features of it is LBR capture in kretprobes, which is a tremendous help in investigating kernel failures, especially in unfamiliar code (LBR allows to "look back" and figure out "how did we get to this condition" after the fact). And so it's important to minimize the amount of wasted LBR records between some kernel function returns error (and thus is "an interesting event" and BPF program that captures LBR is triggered). Big part of that is ftrace/fprobe/rethook infra, so I was looking into making that part as "minimal" as possible, in the sense of eliminating as many function calls and jump as possible. This has an added benefit of making this hot path faster, but my main motivation is LBR. Anyways, just a bit of context for some of the other patches (like inlining arch_rethook_trampoline_callback). > Thank you, > > > > > These opportunities were found when benchmarking and profiling kprobes and > > kretprobes with BPF-based benchmarks. See individual patches for details and > > results. > > > > Andrii Nakryiko (2): > > objpool: enable inlining objpool_push() and objpool_pop() operations > > objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids > > > > include/linux/objpool.h | 105 +++++++++++++++++++++++++++++++++++-- > > lib/objpool.c | 112 +++------------------------------------- > > 2 files changed, 107 insertions(+), 110 deletions(-) > > > > -- > > 2.43.0 > > > > > -- > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>