Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 7 Jul 2022 10:23:12 -0700

On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
>
> Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
> kretsyscall variants (for return kprobes) to allow users to kprobe
> syscall functions in kernel. These special sections allow to ignore
> complexities and differences between kernel versions and host
> architectures when it comes to syscall wrapper and corresponding
> __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
> CONFIG_ARCH_HAS_SYSCALL_WRAPPER.
>
> Combined with the use of BPF_KSYSCALL() macro, this allows to just
> specify intended syscall name and expected input arguments and leave
> dealing with all the variations to libbpf.
>
> In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
> bpf_program__attach_ksyscall() API which allows to specify syscall name
> at runtime and provide associated BPF cookie value.
>
> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> ---
>  tools/lib/bpf/libbpf.c          | 109 ++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.h          |  16 +++++
>  tools/lib/bpf/libbpf.map        |   1 +
>  tools/lib/bpf/libbpf_internal.h |   2 +
>  4 files changed, 128 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index cb49408eb298..4749fb84e33d 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void)
>                                              strs, sizeof(strs)));
>  }
>
> +static const char *arch_specific_syscall_pfx(void)
> +{
> +#if defined(__x86_64__)
> +       return "x64";
> +#elif defined(__i386__)
> +       return "ia32";
> +#elif defined(__s390x__)
> +       return "s390x";
> +#elif defined(__s390__)
> +       return "s390";
> +#elif defined(__arm__)
> +       return "arm";
> +#elif defined(__aarch64__)
> +       return "arm64";
> +#elif defined(__mips__)
> +       return "mips";
> +#elif defined(__riscv)
> +       return "riscv";
> +#else
> +       return NULL;
> +#endif
> +}
> +
> +static int probe_kern_syscall_wrapper(void)
> +{
> +       /* available_filter_functions is a few times smaller than
> +        * /proc/kallsyms and has simpler format, so we use it as a faster way
> +        * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> +        * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> +        * syscall wrappers
> +        */
> +       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> +       char func_name[128], syscall_name[128];
> +       const char *ksys_pfx;
> +       FILE *f;
> +       int cnt;
> +
> +       ksys_pfx = arch_specific_syscall_pfx();
> +       if (!ksys_pfx)
> +               return 0;
> +
> +       f = fopen(kprobes_file, "r");
> +       if (!f)
> +               return 0;
> +
> +       snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> +
> +       /* check if bpf() syscall wrapper is listed as possible kprobe */
> +       while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
> +               if (strcmp(func_name, syscall_name) == 0) {
> +                       fclose(f);
> +                       return 1;
> +               }
> +       }

Maybe we should do the other way around ?
cat /proc/kallsyms |grep sys_bpf

and figure out the prefix from there?
Then we won't need to do giant
#if defined(__x86_64__)
...

/proc/kallsyms has world read permissions:
proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
unlike available_filter_functions.

Also tracefs might be mounted in a different dir than
/sys/kernel/tracing/
like
/sys/kernel/debug/tracing/