Re: [PATCHv3 bpf-next 09/13] libbpf: Add bpf_program__attach_kprobe_multi_opts function

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 17 Mar 2022 22:14:28 -0700

On Thu, Mar 17, 2022 at 8:53 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Mar 16, 2022 at 5:26 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > +
> > +struct bpf_link *
> > +bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
> > +                                     const char *pattern,
> > +                                     const struct bpf_kprobe_multi_opts *opts)
> > +{
> > +       LIBBPF_OPTS(bpf_link_create_opts, lopts);
> > +       struct kprobe_multi_resolve res = {
> > +               .pattern = pattern,
> > +       };
> > +       struct bpf_link *link = NULL;
> > +       char errmsg[STRERR_BUFSIZE];
> > +       const unsigned long *addrs;
> > +       int err, link_fd, prog_fd;
> > +       const __u64 *cookies;
> > +       const char **syms;
> > +       bool retprobe;
> > +       size_t cnt;
> > +
> > +       if (!OPTS_VALID(opts, bpf_kprobe_multi_opts))
> > +               return libbpf_err_ptr(-EINVAL);
> > +
> > +       syms    = OPTS_GET(opts, syms, false);
> > +       addrs   = OPTS_GET(opts, addrs, false);
> > +       cnt     = OPTS_GET(opts, cnt, false);
> > +       cookies = OPTS_GET(opts, cookies, false);
> > +
> > +       if (!pattern && !addrs && !syms)
> > +               return libbpf_err_ptr(-EINVAL);
> > +       if (pattern && (addrs || syms || cookies || cnt))
> > +               return libbpf_err_ptr(-EINVAL);
> > +       if (!pattern && !cnt)
> > +               return libbpf_err_ptr(-EINVAL);
> > +       if (addrs && syms)
> > +               return libbpf_err_ptr(-EINVAL);
> > +
> > +       if (pattern) {
> > +               err = libbpf_kallsyms_parse(resolve_kprobe_multi_cb, &res);
> > +               if (err)
> > +                       goto error;
> > +               if (!res.cnt) {
> > +                       err = -ENOENT;
> > +                       goto error;
> > +               }
> > +               addrs = res.addrs;
> > +               cnt = res.cnt;
> > +       }
>
> Thanks Jiri.
> Great stuff and a major milestone!
> I've applied Masami's and your patches to bpf-next.
>
> But the above needs more work.
> Currently test_progs -t kprobe_multi
> takes 4 seconds on lockdep+debug kernel.
> Mainly because of the above loop.
>
>     18.05%  test_progs       [kernel.kallsyms]   [k]
> kallsyms_expand_symbol.constprop.4
>     12.53%  test_progs       libc-2.28.so        [.] _IO_vfscanf
>      6.31%  test_progs       [kernel.kallsyms]   [k] number
>      4.66%  test_progs       [kernel.kallsyms]   [k] format_decode
>      4.65%  test_progs       [kernel.kallsyms]   [k] string_nocheck
>
> Single test_skel_api() subtest takes almost a second.
>
> A cache inside libbpf probably won't help.
> Maybe introduce a bpf iterator for kallsyms?

BPF iterator for kallsyms is a great idea! So many benefits:
  - it should be significantly more efficient *and* simpler than
parsing /proc/kallsyms;
  - there were some upstream patches recording ksym length (i.e.,
function size), don't remember if that ever landed or not, but besides
that the other complication of even exposing that to user space were
concerns about /proc/kallsyms format being an ABI. With the BPF
iterator we can easily provide that symbol size without any breakage.
This would be great!
  - we can allow parameterizing iterator with options like: skip or
include module symbols, specify a set of types of symbols (function,
variable, etc), etc. This would speed everything up in common cases by
not even decompressing irrelevant names.

In short, kallsyms iterator would be an immensely useful for any sort
of tracing tool that deals with kernel stack traces or kallsyms in
general.

But in this particular case, kprobe_multi_resolve_syms()
implementation is extremely suboptimal. I didn't realize during review
that kallsyms_lookup_name() is a linear scan... If that's not going to
be changed to O(log(N)) some time soon, we need to reimplement
kprobe_multi_resolve_syms(), probably.

One way would be to sort user strings lexicographically and then do a
linear scan over all kallsyms, for each symbol perform binary search
over a sorted array of user strings. Stop once all the positions were
"filled in" (we'd need to keep a bitmap or bool[], probably). This way
it's going to be O(MlogN) instead of O(MN) as it is right now.

BTW, Jiri, libbpf.map is supposed to have an alphabetically ordered
list of functions, it would be good to move
bpf_program__attach_kprobe_multi_opts a bit higher before libbpf_*
functions.

>
> On the kernel side kprobe_multi_resolve_syms() looks similarly inefficient.
> I'm not sure whether it would be a bottle neck though.
>
> Orthogonal to this issue please add a new stress test
> to selftest/bpf that attaches to a lot of functions.