Re: [PATCH v2 bpf-next][RFC] libbpf: introduce legacy kprobe events support

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 7 Apr 2021 15:33:22 -0700

On Tue, Apr 6, 2021 at 9:49 PM Rafael David Tinoco
<rafaeldtinoco@xxxxxxxxxx> wrote:
>
> Sorry taking so long for replying on this… have been working in:
> https://github.com/rafaeldtinoco/conntracker/tree/main/ebpf
> as a consumer for the work being proposed by this patch.
>
> Current working version at:
> https://github.com/rafaeldtinoco/conntracker/blob/main/ebpf/patches/libbpf-introduce-legacy-kprobe-events-support.patch
> About to be changed with suggestions from this thread.
>
> >>> --- a/src/libbpf.c
> >>> +++ b/src/libbpf.c
> >>> @@ -9465,6 +9465,10 @@ struct bpf_link {
> >>>       char *pin_path;         /* NULL, if not pinned */
> >>>       int fd;                 /* hook FD, -1 if not applicable */
> >>>       bool disconnected;
> >>> +     struct {
> >>> +             const char *name;
> >>> +             bool retprobe;
> >>> +     } legacy;
> >>>  };
> >>
> >> For bpf_link->detach() I needed func_name somewhere.
> >
> > Right, though it's not func_name that you need, but "event_name".
>
> Yep.
>
> > Let's add link ([0]) to poke_kprobe_events somewhere, and probably
> > event have example full syntax of all the commands:
> >
> >  p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]  : Set a probe
> >  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
> >  p:[GRP/]EVENT] [MOD:]SYM[+0]%return [FETCHARGS]       : Set a return probe
> >  -:[GRP/]EVENT                                         : Clear a probe
> >
> >   [0] https://www.kernel.org/doc/html/latest/trace/kprobetrace.html
>
> Add [0] as a comment you say (as a reference) ? Or you mean to alter
> the way I’m writing to kprobe_events file in a more complete way ?

As a reference.

>
> > Now, you should not extend bpf_link itself. Create bpf_link_kprobe,
> > that will have those two extra fields. Put struct bpf_link as a first
> > field of bpf_link_kprobe. We used to have bpf_link_fd, you can try to
> > find it in Git history to see how it was done.
>
> Will do.
>

[...]

> > So I don't get at all why you have these toggles, especially
> > ALL_TOGGLE? You shouldn't try to determine the state of another probe.
> > You always know whether you want to enable or disable your specific
> > toggle. I'm very confused by all this.
>
> Yes, this was a confusing thing indeed and to be honest it proved to
> be very buggy when testing with conntracker. What I’ll do (or I’m
> doing) is to toggle ON to needed files before the probe is added:
>
> static inline int add_kprobe_event_legacy(const char* func_name, bool
> retprobe)
> {
>         int ret = 0;
>
>         ret |= poke_kprobe_events(true, func_name, retprobe);
>         ret |= toggle_kprobe_event_legacy_all(true);
>         ret |= toggle_single_kprobe_event_legacy(true, func_name, retprobe);
>
>         return ret;
> }
>
> 1) /sys/kernel/debug/tracing/kprobe_events => 1
> 2) /sys/kernel/debug/tracing/events/kprobes/enable => 1
> 3) /sys/kernel/debug/tracing/events/kprobes/%s/enable => 1

Ok, hold on. I don't think we should use those /enable files,
actually. Double-checking what BCC does ([0]) and my local demo app I
wrote a while ago, we use perf_event_open() to activate kprobe, once
it is created, and that's all that is necessary.

  [0] https://github.com/iovisor/bcc/blob/master/src/cc/libbpf.c#L1046

>
> And toggle off only kprobe_event:
>
> static inline int remove_kprobe_event_legacy(const char* func_name, bool
> retprobe)
> {
>         int ret = 0;
>
>         ret |= toggle_single_kprobe_event_legacy(false, func_name, retprobe);
>         ret |= poke_kprobe_events(false, func_name, retprobe);
>
>         return ret;
> }
>
> 1) /sys/kernel/debug/tracing/events/kprobes/%s/enable => 0
>
> This is working good for my tests.
>
> >
> >>> +static int kprobe_event_normalize(char *newname, size_t size, const char
> >>> *name, bool retprobe)
> >>> +{
> >>> +     int ret = 0;
> >>> +
> >>> +     if (IS_ERR(name))
> >>> +             return -1;
> >>> +
> >>> +     if (retprobe)
> >>> +             ret = snprintf(newname, size, "kprobes/%s_ret", name);
> >>> +     else
> >>> +             ret = snprintf(newname, size, "kprobes/%s", name);
> >>> +
> >>> +     if (ret <= strlen("kprobes/"))
> >>> +             ret = -errno;
> >>> +
> >>> +     return ret;
> >>> +}
> >>> +
> >>> +static int toggle_single_kprobe_event_legacy(bool on, const char *name,
> >>> bool retprobe)
> >
> > don't get why you need this function either...
>
> Because of /sys/kernel/debug/tracing/events/kprobes/%s/enable. I’m
> toggling it to OFF before removing the kprobe in kprobe_events, like
> showed above.

Alright, see above about enable files, it doesn't seem necessary,
actually. You use poke_kprobe_events() to add or remove kprobe to the
kernel. That gives you event_name and its id (from
/sys/kernel/debug/tracing/events/kprobes/%s/id). You then use that id
to create perf_event and activate BPF program:

  struct perf_event_attr attr;
  struct bpf_link* link;
  int fd = -1, err, id;
  FILE* f = NULL;

  err = poke_kprobe_events(true /*add*/, func_name, is_kretprobe);
  if (err) {
    fprintf(stderr, "failed to create kprobe event: %d\n", err);
    return NULL;
  }

  snprintf(
      fname,
      sizeof(fname),
      "/sys/kernel/debug/tracing/events/kprobes/%s/id",
      func_name);
  f = fopen(fname, "r");
  if (!f) {
    fprintf(stderr, "failed to open kprobe id file '%s': %d\n", fname, -errno);
    goto err_out;
  }

  if (fscanf(f, "%d\n", &id) != 1) {
    fprintf(stderr, "failed to read kprobe id from '%s': %d\n", fname, -errno);
    goto err_out;
  }

  fclose(f);
  f = NULL;

  memset(&attr, 0, sizeof(attr));
  attr.size = sizeof(attr);
  attr.config = id;
  attr.type = PERF_TYPE_TRACEPOINT;
  attr.sample_period = 1;
  attr.wakeup_events = 1;

  fd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
  if (fd < 0) {
    fprintf(
        stderr,
        "failed to create perf event for kprobe ID %d: %d\n",
        id,
        -errno);
    goto err_out;
  }

  link = bpf_program__attach_perf_event(prog, fd);

And that should be it. It doesn't seem like either BCC or my example
(which I'm sure worked last time) does anything with /enable files and
I'm sure all that works.

[...]

> >>>      return bpf_program__attach_kprobe(prog, retprobe, func_name);
> >>>  }
> >>
> >> I’m assuming this is okay based on your saying of detecting a feature
> >> instead of using the if(x) if(y) approach.
> >>
> >>> @@ -11280,4 +11629,7 @@ void bpf_object__destroy_skeleton(struct
> >>> bpf_object_skeleton *s)
> >>>       free(s->maps);
> >>>       free(s->progs);(),
> >>>       free(s);
> >>> +
> >>> +     remove_kprobe_event_legacy("ip_set_create", false);
> >>> +     remove_kprobe_event_legacy("ip_set_create", true);
> >>
> >> This is the main issue I wanted to show you before continuing.
> >> I cannot remove the kprobe event unless the obj is unloaded.
> >> That is why I have this hard coded here, just because I was
> >> testing. Any thoughts how to cleanup the kprobes without
> >> jeopardising the API too much ?
> >
> > cannot as in it doesn't work for whatever reason? Or what do you mean?
> >
> > I see that you had bpf_link__detach_perf_event_legacy calling
> > remove_kprobe_event_legacy, what didn't work?
> >
>
> I’m sorry for not being very clear here. What happens is that, if I
> try to remove the kprobe_event_legacy() BEFORE:
>
> if (s->progs)
>         bpf_object__detach_skeleton(s);
> if (s->obj)
>         bpf_object__close(*s->obj);
>
> It fails with generic write error on kprobe_events file. I need to
> remove legacy kprobe AFTER object closure. To workaround this on
> my project, and to show you this issue, I have come up with:
>
> void bpf_object__destroy_skeleton(struct bpf_object_skeleton *s)
> {
>          int i, j;
>          struct probeleft {
>                  char *probename;
>                  bool retprobe;
>          } probesleft[24];
>
>          for (i = 0, j = 0; i < s->prog_cnt; i++) {
>                  struct bpf_link **link = s->progs[i].link;
>                  if ((*link)->legacy.name) {
>                          memset(&probesleft[j], 0, sizeof(struct probeleft));
>                          probesleft[j].probename = strdup((*link)->legacy.name);
>                          probesleft[j].retprobe = (*link)->legacy.retprobe;
>                          j++;
>                  }
>          }
>
>          if (s->progs)
>                  bpf_object__detach_skeleton(s);
>          if (s->obj)
>                  bpf_object__close(*s->obj);
>          free(s->maps);
>          free(s->progs);
>          free(s);
>
>          for (j--; j >= 0; j--) {
>                  remove_kprobe_event_legacy(probesleft[j].probename, probesleft[j].retprobe);
>                  free(probesleft[j].probename);
>          }
> }
>
> Which, of course, is not what I’m suggesting to the lib, but shows
> the problem and gives you a better idea on how to solve it not
> breaking the API.
>

bpf_link__destroy() callback should handle that, no? You'll close perf
event FD, which will "free up" kprobe and you can do
poke_kprobe_events(false /*remove */, ...). Or am I still missing
something?

> > You somehow ended up with 3 times more code and I have more questions
> > now then before. When you say "it doesn't work", please make sure to
> > explain what exactly doesn't work, what you did, what you expected to
> > happen/see.
>
> Deal. Thanks a lot for reviewing all this.
>
> -rafaeldtinoco
>