On Wed, Feb 9, 2022 at 2:25 AM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Mon, Feb 7, 2022 at 1:58 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Mon, Feb 7, 2022 at 6:31 AM Hengqi Chen <hengqi.chen@xxxxxxxxx> wrote: > > > > > > Add syscall-specific variant of BPF_KPROBE named BPF_KPROBE_SYSCALL ([0]). > > > The new macro hides the underlying way of getting syscall input arguments. > > > With the new macro, the following code: > > > > > > SEC("kprobe/__x64_sys_close") > > > int BPF_KPROBE(do_sys_close, struct pt_regs *regs) > > > { > > > int fd; > > > > > > fd = PT_REGS_PARM1_CORE(regs); > > > /* do something with fd */ > > > } > > > > > > can be written as: > > > > > > SEC("kprobe/__x64_sys_close") > > > int BPF_KPROBE_SYSCALL(do_sys_close, int fd) > > > { > > > /* do something with fd */ > > > } > > > > > > [0] Closes: https://github.com/libbpf/libbpf/issues/425 > > > > > > Signed-off-by: Hengqi Chen <hengqi.chen@xxxxxxxxx> > > > --- > > > tools/lib/bpf/bpf_tracing.h | 33 +++++++++++++++++++++++++++++++++ > > > 1 file changed, 33 insertions(+) > > > > > > diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h > > > index cf980e54d331..7ad9cdea99e1 100644 > > > --- a/tools/lib/bpf/bpf_tracing.h > > > +++ b/tools/lib/bpf/bpf_tracing.h > > > @@ -461,4 +461,37 @@ typeof(name(0)) name(struct pt_regs *ctx) \ > > > } \ > > > static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args) > > > > > > +#define ___bpf_syscall_args0() ctx > > > +#define ___bpf_syscall_args1(x) ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs) > > > +#define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs) > > > +#define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs) > > > +#define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs) > > > +#define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs) > > > +#define ___bpf_syscall_args(args...) ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args) > > > + > > > +/* > > > + * BPF_KPROBE_SYSCALL is a variant of BPF_KPROBE, which is intended for > > > + * tracing syscall functions, like __x64_sys_close. It hides the underlying > > > + * platform-specific low-level way of getting syscall input arguments from > > > + * struct pt_regs, and provides a familiar typed and named function arguments > > > + * syntax and semantics of accessing syscall input parameters. > > > + * > > > + * Original struct pt_regs* context is preserved as 'ctx' argument. This might > > > + * be necessary when using BPF helpers like bpf_perf_event_output(). > > > + */ > > > > LGTM. Please also mention that this macro relies on CO-RE so that > > users are aware. > > > > Now that Ilya's fixes are in again, added a small note about reliance > on BPF CO-RE and pushed to bpf-next, thanks. > > > On a relevant note. The whole __x64_sys_close vs sys_close depending > on architecture and kernel version was always super annoying. BCC > makes this transparent to users (AFAIK) and it always bothered me a > little, but I didn't see a clean solution that fits libbpf. > > I think I finally found it, though. Instead of guessing whether the > kprobe function is a syscall or not based on "sys_" prefix of a kernel > function, we can use libbpf SEC() handling to do this transparently. > What if we define two new SEC() definitions: > > SEC("ksyscall/write") and SEC("kretsyscall/write") (or maybe > SEC("kprobe.syscall/write") and SEC("kretprobe.syscall/write"), not > sure which one is better, voice your opinion, please). And for such > special kprobes, libbpf will perform feature detection of this > ARCH_SYSCALL_WRAPPER (we'll need to see the best way to do this in a > simple and fast way, preferably without parsing kallsyms) and > depending on it substitute either sys_write (or should it be > __se_sys_write, according to Naveen) or __<arch>_sys_write. You get > the idea. > > I like that this is still explicit and in the spirit of libbpf, but > offloads the burden of knowing these intricate differences from users. > > Thoughts? I think it will be just as fragile. That syscall prefix was changed by the kernel few times now. libbpf will be chasing the moving target. I think keeping the magic in .h is simpler and less of a maintenance burden.