On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding > kretsyscall variants (for return kprobes) to allow users to kprobe > syscall functions in kernel. These special sections allow to ignore > complexities and differences between kernel versions and host > architectures when it comes to syscall wrapper and corresponding > __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on > CONFIG_ARCH_HAS_SYSCALL_WRAPPER. > > Combined with the use of BPF_KSYSCALL() macro, this allows to just > specify intended syscall name and expected input arguments and leave > dealing with all the variations to libbpf. > > In addition to SEC("ksyscall+") and SEC("kretsyscall+") add > bpf_program__attach_ksyscall() API which allows to specify syscall name > at runtime and provide associated BPF cookie value. > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > --- > tools/lib/bpf/libbpf.c | 109 ++++++++++++++++++++++++++++++++ > tools/lib/bpf/libbpf.h | 16 +++++ > tools/lib/bpf/libbpf.map | 1 + > tools/lib/bpf/libbpf_internal.h | 2 + > 4 files changed, 128 insertions(+) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index cb49408eb298..4749fb84e33d 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void) > strs, sizeof(strs))); > } > > +static const char *arch_specific_syscall_pfx(void) > +{ > +#if defined(__x86_64__) > + return "x64"; > +#elif defined(__i386__) > + return "ia32"; > +#elif defined(__s390x__) > + return "s390x"; > +#elif defined(__s390__) > + return "s390"; > +#elif defined(__arm__) > + return "arm"; > +#elif defined(__aarch64__) > + return "arm64"; > +#elif defined(__mips__) > + return "mips"; > +#elif defined(__riscv) > + return "riscv"; > +#else > + return NULL; > +#endif > +} > + > +static int probe_kern_syscall_wrapper(void) > +{ > + /* available_filter_functions is a few times smaller than > + * /proc/kallsyms and has simpler format, so we use it as a faster way > + * to check that __<arch>_sys_bpf symbol exists, which is a sign that > + * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses > + * syscall wrappers > + */ > + static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions"; > + char func_name[128], syscall_name[128]; > + const char *ksys_pfx; > + FILE *f; > + int cnt; > + > + ksys_pfx = arch_specific_syscall_pfx(); > + if (!ksys_pfx) > + return 0; > + > + f = fopen(kprobes_file, "r"); > + if (!f) > + return 0; > + > + snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx); > + > + /* check if bpf() syscall wrapper is listed as possible kprobe */ > + while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) { > + if (strcmp(func_name, syscall_name) == 0) { > + fclose(f); > + return 1; > + } > + } Maybe we should do the other way around ? cat /proc/kallsyms |grep sys_bpf and figure out the prefix from there? Then we won't need to do giant #if defined(__x86_64__) ... /proc/kallsyms has world read permissions: proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops); unlike available_filter_functions. Also tracefs might be mounted in a different dir than /sys/kernel/tracing/ like /sys/kernel/debug/tracing/