Re: [PATCH bpf-next 1/3] samples: bpf: refactor kprobe tracing user progs with libbpf

"Daniel T. Lee" <danieltimlee@xxxxxxxxx> · Fri, 15 May 2020 17:21:48 +0900

On Thu, May 14, 2020 at 12:29 AM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 5/12/20 11:51 PM, Daniel T. Lee wrote:
> > On Wed, May 13, 2020 at 10:40 AM Yonghong Song <yhs@xxxxxx> wrote:
> >>
> >>
> >>
> >> On 5/12/20 7:43 AM, Daniel T. Lee wrote:
> >>> Currently, the kprobe BPF program attachment method for bpf_load is
> >>> quite old. The implementation of bpf_load "directly" controls and
> >>> manages(create, delete) the kprobe events of DEBUGFS. On the other hand,
> >>> using using the libbpf automatically manages the kprobe event.
> >>> (under bpf_link interface)
> >>>
> >>> By calling bpf_program__attach(_kprobe) in libbpf, the corresponding
> >>> kprobe is created and the BPF program will be attached to this kprobe.
> >>> To remove this, by simply invoking bpf_link__destroy will clean up the
> >>> event.
> >>>
> >>> This commit refactors kprobe tracing programs (tracex{1~7}_user.c) with
> >>> libbpf using bpf_link interface and bpf_program__attach.
> >>>
> >>> tracex2_kern.c, which tracks system calls (sys_*), has been modified to
> >>> append prefix depending on architecture.
> >>>
> >>> Signed-off-by: Daniel T. Lee <danieltimlee@xxxxxxxxx>
> >>> ---
> >>>    samples/bpf/Makefile       | 12 +++----
> >>>    samples/bpf/tracex1_user.c | 41 ++++++++++++++++++++----
> >>>    samples/bpf/tracex2_kern.c |  8 ++++-
> >>>    samples/bpf/tracex2_user.c | 55 ++++++++++++++++++++++++++------
> >>>    samples/bpf/tracex3_user.c | 65 ++++++++++++++++++++++++++++----------
> >>>    samples/bpf/tracex4_user.c | 55 +++++++++++++++++++++++++-------
> >>>    samples/bpf/tracex6_user.c | 53 +++++++++++++++++++++++++++----
> >>>    samples/bpf/tracex7_user.c | 43 ++++++++++++++++++++-----
> >>>    8 files changed, 268 insertions(+), 64 deletions(-)
> >>>
> >>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> >>> index 424f6fe7ce38..4c91e5914329 100644
> >>> --- a/samples/bpf/Makefile
> >>> +++ b/samples/bpf/Makefile
> >>> @@ -64,13 +64,13 @@ fds_example-objs := fds_example.o
> >>>    sockex1-objs := sockex1_user.o
> >>>    sockex2-objs := sockex2_user.o
> >>>    sockex3-objs := bpf_load.o sockex3_user.o
> >>> -tracex1-objs := bpf_load.o tracex1_user.o $(TRACE_HELPERS)
> >>> -tracex2-objs := bpf_load.o tracex2_user.o
> >>> -tracex3-objs := bpf_load.o tracex3_user.o
> >>> -tracex4-objs := bpf_load.o tracex4_user.o
> >>> +tracex1-objs := tracex1_user.o $(TRACE_HELPERS)
> >>> +tracex2-objs := tracex2_user.o
> >>> +tracex3-objs := tracex3_user.o
> >>> +tracex4-objs := tracex4_user.o
> >>>    tracex5-objs := bpf_load.o tracex5_user.o $(TRACE_HELPERS)
> >>> -tracex6-objs := bpf_load.o tracex6_user.o
> >>> -tracex7-objs := bpf_load.o tracex7_user.o
> >>> +tracex6-objs := tracex6_user.o
> >>> +tracex7-objs := tracex7_user.o
> >>>    test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o
> >>>    trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS)
> >>>    lathist-objs := bpf_load.o lathist_user.o
> >>> diff --git a/samples/bpf/tracex1_user.c b/samples/bpf/tracex1_user.c
> >>> index 55fddbd08702..1b15ab98f7d3 100644
> >>> --- a/samples/bpf/tracex1_user.c
> >>> +++ b/samples/bpf/tracex1_user.c
> >>> @@ -1,21 +1,45 @@
> >>>    // SPDX-License-Identifier: GPL-2.0
> >>>    #include <stdio.h>
> >>> -#include <linux/bpf.h>
> >>>    #include <unistd.h>
> >>> -#include <bpf/bpf.h>
> >>> -#include "bpf_load.h"
> >>> +#include <bpf/libbpf.h>
> >>>    #include "trace_helpers.h"
> >>>
> >>> +#define __must_check
> >>
> >> This is not very user friendly.
> >> Maybe not including linux/err.h and
> >> use libbpf API libbpf_get_error() instead?
> >>
> >
> > This approach looks more apparent and can stick with the libbpf API.
> > I'll update code using this way.
> >
> >>> +#include <linux/err.h>
> >>> +
> >>>    int main(int ac, char **argv)
> >>>    {
> >>> -     FILE *f;
> >>> +     struct bpf_link *link = NULL;
> >>> +     struct bpf_program *prog;
> >>> +     struct bpf_object *obj;
> >>>        char filename[256];
> >>> +     FILE *f;
> >>>
> >>>        snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
> >>> +     obj = bpf_object__open_file(filename, NULL);
> >>> +     if (IS_ERR(obj)) {
> >>> +             fprintf(stderr, "ERROR: opening BPF object file failed\n");
> >>> +             obj = NULL;
> >>> +             goto cleanup;
> >>
> >> You do not need to goto cleanup, directly return 0 is okay here.
> >> The same for other files in this patch.
> >>
> >
> > As you said, it would be better to return right away than to proceed
> > any further. I'll apply the code at next patch.
> >
> >>> +     }
> >>> +
> >>> +     prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
> >>> +     if (!prog) {
> >>> +             fprintf(stderr, "ERROR: finding a prog in obj file failed\n");
> >>> +             goto cleanup;
> >>> +     }
> >>> +
> >>> +     /* load BPF program */
> >>> +     if (bpf_object__load(obj)) {
> >>> +             fprintf(stderr, "ERROR: loading BPF object file failed\n");
> >>> +             goto cleanup;
> >>> +     }
> >>>
> >>> -     if (load_bpf_file(filename)) {
> >>> -             printf("%s", bpf_log_buf);
> >>> -             return 1;
> >>> +     link = bpf_program__attach(prog);
> >>> +     if (IS_ERR(link)) {
> >>> +             fprintf(stderr, "ERROR: bpf_program__attach failed\n");
> >>> +             link = NULL;
> >>> +             goto cleanup;
> >>>        }
> >>>
> >>>        f = popen("taskset 1 ping -c5 localhost", "r");
> >>> @@ -23,5 +47,8 @@ int main(int ac, char **argv)
> >>>
> >>>        read_trace_pipe();
> >>>
> >>> +cleanup:
> >>> +     bpf_link__destroy(link);
> >>> +     bpf_object__close(obj);
> >>
> >> Typically in kernel, we do multiple labels for such cases
> >> like
> >> destroy_link:
> >>          bpf_link__destroy(link);
> >> close_object:
> >>          bpf_object__close(obj);
> >>
> >> The error path in the main() function jumps to proper label.
> >> This is more clean and less confusion.
> >>
> >> The same for other cases in this file.
> >>
> >
> > I totally agree that multiple labels are much more intuitive.
> > But It's not very common to jump to the destroy_link label.
> >
> > Either when on the routine is completed successfully and jumps to the
> > destroy_link branch, or an error occurred while bpf_program__attach
> > was called "several" times and jumps to the destroy_link branch.
> >
> > Single bpf_program__attach like this tracex1 sample doesn't really have
> > to destroy link, since the link has been set to NULL on attach error and
> > bpf_link__destroy() is designed to do nothing if passed NULL to it.
> >
> > So I think current approach will keep consistent between samples since
> > most of the sample won't need to jump to destroy_link.
>
> Since this is the sample code, I won't enforce that. So yes, you can
> keep your current approach.
>
> >
> >>>        return 0;
> >>>    }
> >>> diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
> >>> index d865bb309bcb..ff5d00916733 100644
> >>> --- a/samples/bpf/tracex2_kern.c
> >>> +++ b/samples/bpf/tracex2_kern.c
> >>> @@ -11,6 +11,12 @@
> >>>    #include <bpf/bpf_helpers.h>
> >>>    #include <bpf/bpf_tracing.h>
> >>>
> >>> +#ifdef __x86_64__
> >>> +#define SYSCALL "__x64_"
> >>> +#else
> >>> +#define SYSCALL
> >>> +#endif
> >>
> >> See test_progs.h, one more case to handle:
> >> #ifdef __x86_64__
> >> #define SYS_NANOSLEEP_KPROBE_NAME "__x64_sys_nanosleep"
> >> #elif defined(__s390x__)
> >> #define SYS_NANOSLEEP_KPROBE_NAME "__s390x_sys_nanosleep"
> >> #else
> >> #define SYS_NANOSLEEP_KPROBE_NAME "sys_nanosleep"
> >> #endif
> >>
> >
> > That was also one of the considerations when writing patches.
> > I'm planning to refactor most of the programs in the sample using
> > libbpf, and found out that there are bunch of samples that tracks
> > syscall with kprobe. Replacing all of them will take lots of macros
> > and I thought using prefix will be better idea.
> >
> > Actually, my initial plan was to create macro of SYSCALL()
> >
> >         #ifdef __x86_64__
> >         #define PREFIX "__x64_"
> >         #elif defined(__s390x__)
> >         #define PREFIX "__s390x_"
> >         #else
> >         #define PREFIX ""
> >         #endif
> >
> >         #define SYSCALL(SYS) PREFIX ## SYS
> >
> > And to use this macro universally without creating additional headers,
> > I was trying to add this to samples/bpf/syscall_nrs.c which later
> > compiles to samples/bpf/syscall_nrs.h. But it was pretty hacky way and
> > it won't work properly. So I ended up with just adding prefix to syscall.
>
> I think it is okay to create a trace_common.h to have this definition
> defined in one place and use them in bpf programs.
>
> >
> > Is it necessary to define all of the macro for each architecture?
>
> Yes, if we define in trace_common.h, let us do for x64/x390x/others
> similar to the above.
>

Sounds great. I'll add trace_common.h and apply the syscall macro.
I'll send a new version of the patch soon.

> >
> >>> +
> >>>    struct bpf_map_def SEC("maps") my_map = {
> >>>        .type = BPF_MAP_TYPE_HASH,
> >>>        .key_size = sizeof(long),
> >>> @@ -77,7 +83,7 @@ struct bpf_map_def SEC("maps") my_hist_map = {
> >>>        .max_entries = 1024,
> >>>    };
> >>>
> >>> -SEC("kprobe/sys_write")
> >>> +SEC("kprobe/" SYSCALL "sys_write")
> >>>    int bpf_prog3(struct pt_regs *ctx)
> >>>    {
> >>>        long write_size = PT_REGS_PARM3(ctx);
> >> [...]
> >
> >
> > Thank you for your time and effort for the review :)
> >
> > Best,
> > Daniel
> >

Thank you for your time and effort for the review :)

-- 
Best,
Daniel T. Lee