On Thu, Aug 17, 2023 at 8:37 AM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote: > > It is possible to use 'perf trace' with tracepoints and in that case we > can't initialize/use the augmented_raw_syscalls BPF skel. > > For instance, this usecase: > > # perf trace -e sched:*exec --max-events=5 > ? ( ): NetworkManager/1183 ... [continued]: poll()) = 1 > 0.043 ( 0.007 ms): NetworkManager/1183 epoll_wait(epfd: 17<anon_inode:[eventpoll]>, events: 0x55555f90e920, maxevents: 6) = 0 > 0.060 ( 0.007 ms): NetworkManager/1183 write(fd: 3<anon_inode:[eventfd]>, buf: 0x7ffc5a27cd30, count: 8) = 8 > 0.073 ( 0.005 ms): NetworkManager/1183 epoll_wait(epfd: 24<anon_inode:[eventpoll]>, events: 0x7ffc5a27cd20, maxevents: 2) = 1 > 0.082 ( 0.010 ms): NetworkManager/1183 recvmmsg(fd: 26<socket:[30298]>, mmsg: 0x7ffc5a27caa0, vlen: 8) = 1 > # > > Where we want to trace just some sched tracepoints ending in 'exec' ends > up tracing all syscalls. > > Fix it by checking existing trace->trace_syscalls boolean to see if we > need the augmenter. > > A followup patch will move those sections of code used only with the > augmenter to separate functions, to get it cleaner and remove the goto, > done just for reviewing purposes. > > With this patch in place the previous behaviour is restored: no syscalls > when we have other events and no syscall names: > > [root@quaco ~]# perf probe do_filp_open "filename=pathname->name:string" > Added new event: > probe:do_filp_open (on do_filp_open with filename=pathname->name:string) > > You can now use it in all perf tools, such as: > > perf record -e probe:do_filp_open -aR sleep 1 > > [root@quaco ~]# perf trace --max-events=10 -e probe:do_filp_open sleep 1 > 0.000 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache") > 0.056 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6") > 0.481 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive") > 0.501 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias") > 0.572 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION") > 0.581 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION") > 0.616 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib64/gconv/gconv-modules.cache") > 0.656 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT") > 0.664 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_MEASUREMENT") > 0.696 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE") > [root@quaco ~]# > > As well as mixing syscalls with tracepoints, getting the syscall > tracepoints used augmented using the BPF skel: > > [root@quaco ~]# perf trace --max-events=10 -e open*,probe:do_filp_open sleep 1 > 0.000 ( ): sleep/455124 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ... > 0.005 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache") > 0.000 ( 0.011 ms): sleep/455124 ... [continued]: openat()) = 3 > 0.031 ( ): sleep/455124 openat(dfd: CWD, filename: "/lib64/libc.so.6", flags: RDONLY|CLOEXEC) ... > 0.033 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6") > 0.031 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = 3 > 0.258 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/lib/locale/locale-archive", flags: RDONLY|CLOEXEC) ... > 0.261 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive") > 0.258 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = -1 ENOENT (No such file or directory) > 0.272 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) ... > 0.273 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias") > > A final note: the probe:do_filp_open uses a kprobe (probably optimized > as its in the start of a function) that uses the kprobe_tracer mechanism > in the kernel to collect the pathname->name string and stash it into the > tracepoint created by 'perf probe' for that: > > [root@quaco ~]# cat /sys/kernel/debug/tracing/kprobe_events > p:probe/do_filp_open _text+4621920 filename=+0(+0(%si)):string > [root@quaco ~]# > > While the syscalls:sys_enter_openat tracepoint gets its string from a > BPF program attached to raw_syscalls:sys_enter that tail calls into > another BPF program that knows the types for the openat syscall args and > thus can bpf_probe_read it right after the normal > sys_enter/sys_enter_openat tracepoint payload that comes prefixed with > whatever perf_event_open asked for (CPU, timestamp, etc): > > [root@quaco ~]# bpftool prog | grep -E "sys_enter |sys_enter_opena" -A3 > 3176: tracepoint name sys_enter tag 0bc3fc9d11754ba1 gpl > loaded_at 2023-08-17T12:32:20-0300 uid 0 > xlated 272B jited 257B memlock 4096B map_ids 2462,2466,2463 > btf_id 2976 > -- > 3180: tracepoint name sys_enter_opena tag 19dd077f00ec2f58 gpl > loaded_at 2023-08-17T12:32:20-0300 uid 0 > xlated 328B jited 206B memlock 4096B map_ids 2466,2465 > btf_id 2976 > [root@quaco ~]# > > Fixes: 42963c8bedeb864b ("perf trace: Migrate BPF augmentation to use a skeleton") > Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx> > Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> > Cc: Andrii Nakryiko <andrii@xxxxxxxxxx> > Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx> > Cc: Athira Jajeev <atrajeev@xxxxxxxxxxxxxxxxxx> > Cc: bpf@xxxxxxxxxxxxxxx > Cc: Brendan Gregg <brendan.d.gregg@xxxxxxxxx> > Cc: Carsten Haitzler <carsten.haitzler@xxxxxxx> > Cc: Eduard Zingerman <eddyz87@xxxxxxxxx> > Cc: Fangrui Song <maskray@xxxxxxxxxx> > Cc: He Kuang <hekuang@xxxxxxxxxx> > Cc: Ian Rogers <irogers@xxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: James Clark <james.clark@xxxxxxx> > Cc: Jiri Olsa <jolsa@xxxxxxxxxx> > Cc: Kan Liang <kan.liang@xxxxxxxxxxxxxxx> > Cc: Leo Yan <leo.yan@xxxxxxxxxx> > Cc: llvm@xxxxxxxxxxxxxxx > Cc: Madhavan Srinivasan <maddy@xxxxxxxxxxxxx> > Cc: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Namhyung Kim <namhyung@xxxxxxxxxx> > Cc: Nathan Chancellor <nathan@xxxxxxxxxx> > Cc: Naveen N. Rao <naveen.n.rao@xxxxxxxxxxxxxxxxxx> > Cc: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Ravi Bangoria <ravi.bangoria@xxxxxxx> > Cc: Rob Herring <robh@xxxxxxxxxx> > Cc: Tiezhu Yang <yangtiezhu@xxxxxxxxxxx> > Cc: Tom Rix <trix@xxxxxxxxxx> > Cc: Wang Nan <wangnan0@xxxxxxxxxx> > Cc: Wang ShaoBo <bobo.shaobowang@xxxxxxxxxx> > Cc: Yang Jihong <yangjihong1@xxxxxxxxxx> > Cc: Yonghong Song <yhs@xxxxxx> > Cc: YueHaibing <yuehaibing@xxxxxxxxxx> > Link: https://lore.kernel.org/lkml/ > Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> Reviewed-by: Ian Rogers <irogers@xxxxxxxxxx> Thanks, Ian > --- > tools/perf/builtin-trace.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c > index 0ebfa95895e0bf4d..3964cf44cdbcb3e8 100644 > --- a/tools/perf/builtin-trace.c > +++ b/tools/perf/builtin-trace.c > @@ -3895,7 +3895,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) > if (err < 0) > goto out_error_open; > #ifdef HAVE_BPF_SKEL > - { > + if (trace->syscalls.events.bpf_output) { > struct perf_cpu cpu; > > /* > @@ -3916,7 +3916,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) > goto out_error_mem; > > #ifdef HAVE_BPF_SKEL > - if (trace->skel->progs.sys_enter) > + if (trace->skel && trace->skel->progs.sys_enter) > trace__init_syscalls_bpf_prog_array_maps(trace); > #endif > > @@ -4850,6 +4850,9 @@ int cmd_trace(int argc, const char **argv) > } > > #ifdef HAVE_BPF_SKEL > + if (!trace.trace_syscalls) > + goto skip_augmentation; > + > trace.skel = augmented_raw_syscalls_bpf__open(); > if (!trace.skel) { > pr_debug("Failed to open augmented syscalls BPF skeleton"); > @@ -4884,6 +4887,7 @@ int cmd_trace(int argc, const char **argv) > } > trace.syscalls.events.bpf_output = evlist__last(trace.evlist); > assert(!strcmp(evsel__name(trace.syscalls.events.bpf_output), "__augmented_syscalls__")); > +skip_augmentation: > #endif > err = -1; > > -- > 2.41.0 >