Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 11 Jul 2022 22:00:46 -0700

On Mon, Jul 11, 2022 at 9:20 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Jul 11, 2022 at 09:28:29AM -0700, Andrii Nakryiko wrote:
> > On Sat, Jul 9, 2022 at 5:38 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Jul 8, 2022 at 3:05 PM Andrii Nakryiko
> > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > >
> > > > On Fri, Jul 8, 2022 at 4:28 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Jul 07, 2022 at 12:10:30PM -0700, Andrii Nakryiko wrote:
> > > > >
> > > > > SNIP
> > > > >
> > > > > > > Maybe we should do the other way around ?
> > > > > > > cat /proc/kallsyms |grep sys_bpf
> > > > > > >
> > > > > > > and figure out the prefix from there?
> > > > > > > Then we won't need to do giant
> > > > > > > #if defined(__x86_64__)
> > > > > > > ...
> > > > > > >
> > > > > >
> > > > > > Unfortunately this won't work well due to compat and 32-bit APIs (and
> > > > > > bpf() syscall is particularly bad with also bpf_sys_bpf):
> > > > > >
> > > > > > $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> > > > > > ffffffff811cb100 t __sys_bpf
> > > > > > ffffffff811cd380 T bpf_sys_bpf
> > > > > > ffffffff811cd520 T __x64_sys_bpf
> > > > > > ffffffff811cd540 T __ia32_sys_bpf
> > > > > > ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> > > > > > ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> > > > > > ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> > > > > > ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> > > > > > ffffffff83abc410 t _eil_addr___x64_sys_bpf
> > > > > >
> > > > > > $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> > > > > > ffffffff81024480 T __x64_sys_mmap
> > > > > > ffffffff810244c0 T __ia32_sys_mmap
> > > > > > ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> > > > > > ffffffff83abae40 t _eil_addr___x64_sys_mmap
> > > > > >
> > > > > > We have similar arch-specific switches in few other places (USDT and
> > > > > > lib path detection, for example), so it's not a new precedent (for
> > > > > > better or worse).
> > > > > >
> > > > > >
> > > > > > > /proc/kallsyms has world read permissions:
> > > > > > > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > > > > > > unlike available_filter_functions.
> > > > > > >
> > > > > > > Also tracefs might be mounted in a different dir than
> > > > > > > /sys/kernel/tracing/
> > > > > > > like
> > > > > > > /sys/kernel/debug/tracing/
> > > > > >
> > > > > > Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> > > > > > but given it's done once, it might not be a big deal.
> > > > >
> > > > > we could get that also from BTF?
> > > >
> > > > I'd rather not add dependency on BTF for this.
> > >
> > > A weird and non technical reason.
> > > Care to explain this odd excuse?
> >
> > Quite technical reason: minimizing unrelated dependencies. It's not
> > necessary to have vmlinux BTF to use kprobes (especially for kprobing
> > syscalls), so adding dependency on vmlinux BTF just to use
> > SEC("ksyscall") seems completely unnecessary, given we have other
> > alternatives.
>
> If BTF and kallsyms were alternatives then it indeed would make
> sense to avoid implement different schemes for old kernels and recent.
> But libbpf already loads vmlinux BTF for other reasons.

Not necessarily, only if bpf_object requires vmlinux BTF, see
obj_needs_vmlinux_btf().

> It caches it and search in it is fast.
> While libbpf also parses kallsyms it doesn't cache it.
> Yet another search through kallsyms will slow down libbpf loading,
> while another search in cached BTF is close to be free.
> Also we have bpf_btf_find_by_name_kind() in-kernel helper.
> We can prog_run it and optimize libbpf's BTF search to be even faster.

I'm starting to actually lean towards just trying to create perf_event
for __<arch>_sys_<syscall> as a feature detection. It will be fast and
simple, and no need to parse kallsyms or available_filter_functions,
take unnecessary dependency on vmlinux BTF, etc. And I have all that
code written already.