On Fri, Jan 10, 2025 at 04:04:37PM -0800, Andrii Nakryiko wrote: > On Thu, Jan 2, 2025 at 5:21 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > On Thu, Dec 26, 2024 at 02:11:16PM +0900, Masami Hiramatsu (Google) wrote: > > > Hi, > > > > > > Here is the 22nd version of the series to re-implement the fprobe on > > > function-graph tracer. The previous version is; > > > > > > https://lore.kernel.org/all/173379652547.973433.2311391879173461183.stgit@devnote2/ > > > > > > This version is rebased on v6.13-rc4 with fixes on [3/20] for x86-32 and > > > [5/20] for build error. > > > > > > hi, > > I ran the bench and I'm seeing native_sched_clock being used > > again kretprobe_multi bench: > > > > 5.85% bench [kernel.kallsyms] [k] native_sched_clock > > | > > ---native_sched_clock > > sched_clock > > | > > --5.83%--trace_clock_local > > ftrace_return_to_handler > > return_to_handler > > syscall > > bpf_prog_test_run_opts > > completely unrelated, Jiri, but we should stop using > bpf_prog_test_run_opts() for benchmarking. It goes through FD > refcounting, which is unnecessary tiny overhead, but more importantly > it causes cache line bouncing between multiple CPUs (when doing > multi-threaded benchmarks), which skews and limits results. so you mean to switch directly to attaching/hitting kernel functions or perhaps better have kernel module for that? jirka > > > trigger_producer_batch > > start_thread > > __GI___clone3 > > > > I recall we tried to fix that before with [1] change, but that replaced > > later with [2] changes > > > > When I remove the trace_clock_local call in __ftrace_return_to_handler > > than the kretprobe-multi gets much faster (see last block below), so it > > seems worth to make it optional > > > > there's some decrease in kprobe_multi benchmark compared to base numbers, > > which I'm not sure yet why, but other than that it seems ok > > > > base: > > kprobe : 12.873 ± 0.011M/s > > kprobe-multi : 13.088 ± 0.052M/s > > kretprobe : 6.339 ± 0.003M/s > > kretprobe-multi: 7.240 ± 0.002M/s > > > > fprobe_on_fgraph: > > kprobe : 12.816 ± 0.002M/s > > kprobe-multi : 12.126 ± 0.004M/s > > kretprobe : 6.305 ± 0.018M/s > > kretprobe-multi: 7.740 ± 0.003M/s > > > > removed native_sched_clock call: > > kprobe : 12.850 ± 0.006M/s > > kprobe-multi : 12.115 ± 0.006M/s > > kretprobe : 6.270 ± 0.017M/s > > kretprobe-multi: 9.190 ± 0.005M/s > > > > > > happy new year ;-) thanks, > > > > jirka > > > > > > [1] https://lore.kernel.org/bpf/172615389864.133222.14452329708227900626.stgit@devnote2/ > > [2] https://lore.kernel.org/all/20240914214805.779822616@xxxxxxxxxxx/ > > > > [...]