Re: [bpf-next PATCH] bpf: libbpf, support older style kprobe load

John Fastabend <john.fastabend@xxxxxxxxx> · Mon, 21 Oct 2019 22:07:59 -0700

Andrii Nakryiko wrote:
> On Sat, Oct 19, 2019 at 1:30 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> >
> > Following ./Documentation/trace/kprobetrace.rst add support for loading
> > kprobes programs on older kernels.
> >
> 
> My main concern with this is that this code is born bit-rotten,
> because selftests are never testing the legacy code path. How did you
> think about testing this and ensuring that this keeps working going
> forward?
> 

Well we use it, but I see your point and actually I even broke the retprobe
piece hastily fixing merge conflicts in this patch. When I ran tests on it
I missed running retprobe tests on the set of kernels that would hit that
code.

> > Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
> > ---
> >  tools/lib/bpf/libbpf.c |   81 +++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 73 insertions(+), 8 deletions(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index fcea6988f962..12b3105d112c 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -5005,20 +5005,89 @@ static int determine_uprobe_retprobe_bit(void)
> >         return parse_uint_from_file(file, "config:%d\n");
> >  }
> >
> > +static int use_kprobe_debugfs(const char *name,
> > +                             uint64_t offset, int pid, bool retprobe)
> > +{
> > +       const char *file = "/sys/kernel/debug/tracing/kprobe_events";
> > +       int fd = open(file, O_WRONLY | O_APPEND, 0);
> > +       char buf[PATH_MAX];
> > +       int err;
> > +
> > +       if (fd < 0) {
> > +               pr_warning("failed open kprobe_events: %s\n",
> > +                          strerror(errno));
> > +               return -errno;
> 
> errno after pr_warning() call might be clobbered, you need to save it
> locally first

Seems so thanks.

> 
> > +       }
> > +
> > +       snprintf(buf, sizeof(buf), "%c:kprobes/%s %s",
> > +                retprobe ? 'r' : 'p', name, name);
> 
> remember result, check it to detect overflow, and use it in write below?

sure seems more robust. If someone has names longer than PATH_MAX though
it seems a bit much.

> 
> > +
> > +       err = write(fd, buf, strlen(buf));
> > +       close(fd);
> > +       if (err < 0)
> > +               return -errno;
> > +       return 0;
> > +}
> > +
> >  static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
> >                                  uint64_t offset, int pid)
> >  {
> >         struct perf_event_attr attr = {};
> >         char errmsg[STRERR_BUFSIZE];
> > +       uint64_t config1 = 0;
> >         int type, pfd, err;
> >
> >         type = uprobe ? determine_uprobe_perf_type()
> >                       : determine_kprobe_perf_type();
> >         if (type < 0) {
> > -               pr_warning("failed to determine %s perf type: %s\n",
> > -                          uprobe ? "uprobe" : "kprobe",
> > -                          libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
> > -               return type;
> > +               if (uprobe) {
> > +                       pr_warning("failed to determine uprobe perf type %s: %s\n",
> > +                                  name,
> > +                                  libbpf_strerror_r(type,
> > +                                                    errmsg, sizeof(errmsg)));
> > +               } else {
> 
> I think this legacy kprobe setup deserves its own function (maybe even
> combine it with use_kprobe_debugfs to do entire attribute setup from A
> to Z?)
> 
> These 2 levels of nestedness is also unfortunate, how about having two
> functions that are filling out perf_event_attr? Something like:
> 
> err = determine_perf_probe_attr(...)
> if (err)
>     err = determine_legacy_probe_attr(...)
> if (!err)
>     <bail out>
> do perf call here
> 

Perhaps it makes sense to even uplevel this into the API? Something like
bpf_program__attach_legacy_kprobe() then we could test it easer?

> 
> > +                       /* If we do not have an event_source/../kprobes then we
> > +                        * can try to use kprobe-base event tracing, for details
> > +                        * see ./Documentation/trace/kprobetrace.rst
> > +                        */
> > +                       const char *file = "/sys/kernel/debug/tracing/events/kprobes/";
> > +                       char c[PATH_MAX];
> 
> what does c stand for?

Can name it file and push the path into snprintf() below.

> 
> > +                       int fd, n;
> > +
> > +                       snprintf(c, sizeof(c), "%s/%s/id", file, name);
> 
> check result? also, is there a reason to not use
> "/sys/kernel/debug/tracing/events/kprobes/" directly in format string?
> 

I believe when I wrote that I just like the it as a const char better.
But no reason if you like it inline better.

> > +
> > +                       err = use_kprobe_debugfs(name, offset, pid, retprobe);
> > +                       if (err)
> > +                               return err;
> > +
> > +                       type = PERF_TYPE_TRACEPOINT;
> > +                       fd = open(c, O_RDONLY, 0);
> > +                       if (fd < 0) {
> > +                               pr_warning("failed to open tracepoint %s: %s\n",
> > +                                          c, strerror(errno));
> > +                               return -errno;
> > +                       }
> > +                       n = read(fd, c, sizeof(c));
> > +                       close(fd);
> > +                       if (n < 0) {
> > +                               pr_warning("failed to read %s: %s\n",
> > +                                          c, strerror(errno));
> 
> It's a bit fishy that you are reading into c and then print out c on
> error. Its contents might be corrupted at this point.
> 
> And same thing about errno as above.

Sure just didn't see much point in using yet another buffer. We can
just make the pr_warning general enough that the buffer isn't needed.

> 
> > +                               return -errno;
> > +                       }
> > +                       c[n] = '\0';
> > +                       config1 = strtol(c, NULL, 0);
> 
> no need for config1 variable, just attr.config = strtol(...)?

yes, fallout from some code churn. But no reason for it.

> 
> > +                       attr.size = sizeof(attr);
> > +                       attr.type = type;
> > +                       attr.config = config1;
> > +                       attr.sample_period = 1;
> > +                       attr.wakeup_events = 1;
> > +               }
> > +       } else {
> > +               config1 = ptr_to_u64(name);
> 
> same, just straight attr.config1 = ... ?

yes.

> 
> > +               attr.size = sizeof(attr);
> > +               attr.type = type;
> > +               attr.config1 = config1; /* kprobe_func or uprobe_path */
> > +               attr.config2 = offset;  /* kprobe_addr or probe_offset */
> >         }
> >         if (retprobe) {
> >                 int bit = uprobe ? determine_uprobe_retprobe_bit()
> > @@ -5033,10 +5102,6 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
> >                 }
> >                 attr.config |= 1 << bit;
> >         }
> > -       attr.size = sizeof(attr);
> > -       attr.type = type;
> > -       attr.config1 = ptr_to_u64(name); /* kprobe_func or uprobe_path */
> > -       attr.config2 = offset;           /* kprobe_addr or probe_offset */
> >
> >         /* pid filter is meaningful only for uprobes */
> >         pfd = syscall(__NR_perf_event_open, &attr,
> >
> 
> What about the detaching? Would closing perf event FD be enough?
> Wouldn't we need to clear a probe with -:<event>?

It seems to be enough to close the fd. From an API standpoint might
make sense to have a detach() though if adding an attach().