Re: [PATCH bpf-next] libbpf: always specify expected_attach_type on program load if supported

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 13 Apr 2020 21:49:35 -0700

On Mon, Apr 13, 2020 at 3:44 PM Andrey Ignatov <rdna@xxxxxx> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> [Mon, 2020-04-13 15:00 -0700]:
> > On Mon, Apr 13, 2020 at 1:21 PM Andrey Ignatov <rdna@xxxxxx> wrote:
> > >
> > > Andrii Nakryiko <andriin@xxxxxx> [Sat, 2020-04-11 22:58 -0700]:
> > > > For some types of BPF programs that utilize expected_attach_type, libbpf won't
> > > > set load_attr.expected_attach_type, even if expected_attach_type is known from
> > > > section definition. This was done to preserve backwards compatibility with old
> > > > kernels that didn't recognize expected_attach_type attribute yet (which was
> > > > added in 5e43f899b03a ("bpf: Check attach type at prog load time"). But this
> > > > is problematic for some BPF programs that utilize never features that require
> > > > kernel to know specific expected_attach_type (e.g., extended set of return
> > > > codes for cgroup_skb/egress programs).
> > > >
> > > > This patch makes libbpf specify expected_attach_type by default, but also
> > > > detect support for this field in kernel and not set it during program load.
> > > > This allows to have a good metadata for bpf_program
> > > > (e.g., bpf_program__get_extected_attach_type()), but still work with old
> > > > kernels (for cases where it can work at all).
> > > >
> > > > Additionally, due to expected_attach_type being always set for recognized
> > > > program types, bpf_program__attach_cgroup doesn't have to do extra checks to
> > > > determine correct attach type, so remove that additional logic.
> > > >
> > > > Also adjust section_names selftest to account for this change.
> > > >
> > > > More detailed discussion can be found in [0].
> > > >
> > > >   [0] https://lore.kernel.org/bpf/20200412003604.GA15986@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > > >
> > > > Reported-by: Andrey Ignatov <rdna@xxxxxx>
> > > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
> > > > ---
> > > >  tools/lib/bpf/libbpf.c                        | 123 +++++++++++-------
> > > >  .../selftests/bpf/prog_tests/section_names.c  |  42 +++---
> > > >  2 files changed, 106 insertions(+), 59 deletions(-)
> > > >
> > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > index ff9174282a8c..925f720deea0 100644
> > > > --- a/tools/lib/bpf/libbpf.c
> > > > +++ b/tools/lib/bpf/libbpf.c
> > > > @@ -178,6 +178,8 @@ struct bpf_capabilities {
> > > >       __u32 array_mmap:1;
> > > >       /* BTF_FUNC_GLOBAL is supported */
> > > >       __u32 btf_func_global:1;
> > > > +     /* kernel support for expected_attach_type in BPF_PROG_LOAD */
> > > > +     __u32 exp_attach_type:1;
> > > >  };
> > > >
> > > >  enum reloc_type {
> > > > @@ -194,6 +196,22 @@ struct reloc_desc {
> > > >       int sym_off;
> > > >  };
> > > >
> > > > +struct bpf_sec_def;
> > > > +
> > > > +typedef struct bpf_link *(*attach_fn_t)(const struct bpf_sec_def *sec,
> > > > +                                     struct bpf_program *prog);
> > > > +
> > > > +struct bpf_sec_def {
> > > > +     const char *sec;
> > > > +     size_t len;
> > > > +     enum bpf_prog_type prog_type;
> > > > +     enum bpf_attach_type expected_attach_type;
> > > > +     bool is_exp_attach_type_optional;
> > > > +     bool is_attachable;
> > > > +     bool is_attach_btf;
> > > > +     attach_fn_t attach_fn;
> > > > +};
> > > > +
> > > >  /*
> > > >   * bpf_prog should be a better name but it has been used in
> > > >   * linux/filter.h.
> > > > @@ -204,6 +222,7 @@ struct bpf_program {
> > > >       char *name;
> > > >       int prog_ifindex;
> > > >       char *section_name;
> > > > +     const struct bpf_sec_def *sec_def;
> > > >       /* section_name with / replaced by _; makes recursive pinning
> > > >        * in bpf_object__pin_programs easier
> > > >        */
> > > > @@ -3315,6 +3334,32 @@ static int bpf_object__probe_array_mmap(struct bpf_object *obj)
> > > >       return 0;
> > > >  }
> > > >
> > > > +static int
> > > > +bpf_object__probe_exp_attach_type(struct bpf_object *obj)
> > > > +{
> > > > +     struct bpf_load_program_attr attr;
> > > > +     struct bpf_insn insns[] = {
> > > > +             BPF_MOV64_IMM(BPF_REG_0, 0),
> > > > +             BPF_EXIT_INSN(),
> > > > +     };
> > > > +     int fd;
> > > > +
> > > > +     memset(&attr, 0, sizeof(attr));
> > > > +     attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
> > > > +     attr.expected_attach_type = BPF_CGROUP_INET_EGRESS;
> > >
> > > Could you clarify semantics of this function please?
> > >
> > > According to the name it looks like it should check whether
> > > expected_attach_type attribute is supported or not. But
> > > BPF_CGROUP_INET_EGRESS doesn't align with this since
> > > expected_attach_type itself was added long before it was supported for
> > > BPF_CGROUP_INET_EGRESS.
> > >
> > > For example 4fbac77d2d09 ("bpf: Hooks for sys_bind") added in Mar 2018 is
> > > the first hook ever that used expected_attach_type.
> > >
> > > aac3fc320d94 ("bpf: Post-hooks for sys_bind") added a bit later is the
> > > first hook that made expected_attach_type optional (for
> > > BPF_CGROUP_INET_SOCK_CREATE).
> > >
> > > But 5cf1e9145630 ("bpf: cgroup inet skb programs can return 0 to 3") for
> > > BPF_CGROUP_INET_EGRESS was merged more than a year after the previous
> > > two.
> >
> > I'm checking if kernel is rejecting non-zero expected_attach_type
> > field in bpf_attr for BPF_PROG_LOAD command.
> >
> > Before 5e43f899b03a ("bpf: Check attach type at prog load time"),
> > kernel would reject non-zero expected_attach_type because
> > expected_attach_type didn't exist in bpf_attr. So if that's the case,
> > we shouldn't specify expected_attach_type.
> >
> > After that commit, BPF_CGROUP_INET_EGRESS for
> > BPF_PROG_TYPE_CGROUP_SOCK would be supported, even if it is optional,
> > so using that combination should work.
> >
> > Did I miss something?
>
> So you're saying that there is nothing special about
> BPF_CGROUP_INET_EGRESS, you just need _any_ attach type and it just
> happened so that you used this one.

Yes.

>
> That sounds fine, but could you clarify it with a comment please,
> otherwise it looks confusing: "why BPF_CGROUP_INET_EGRESS and not some
> other attach type, for example any of those which got optional
> expected_attach_type earlier, what's so special about
> BPF_CGROUP_INET_EGRESS".

Sure.

>
> Also, I just realized that this combination: BPF_PROG_TYPE_CGROUP_SOCK and
> BPF_CGROUP_INET_EGRESS is incorrect: BPF_CGROUP_INET_EGRESS attach type
> corresponds to BPF_PROG_TYPE_CGROUP_SKB prog type, not to
> BPF_PROG_TYPE_CGROUP_SOCK. This call will always fail with EINVAL on new
> kernels, because of this code in bpf_prog_load_check_attach:

Yep, not sure how I copy-pasted BPF_PROG_TYPE_CGROUP_SOCK instead of
BPF_PROG_TYPE_CGROUP_SKB... My vim-foo clearly failed me here :)

>
>         switch (prog_type) {
>         case BPF_PROG_TYPE_CGROUP_SOCK:
>                 switch (expected_attach_type) {
>                 case BPF_CGROUP_INET_SOCK_CREATE:
>                 case BPF_CGROUP_INET4_POST_BIND:
>                 case BPF_CGROUP_INET6_POST_BIND:
>                         return 0;
>                 default:
>                         return -EINVAL;
>                 }
>
> That should be fixed.
>
> And since this has to be changed anyway I'd go with BPF_PROG_TYPE_CGROUP_SOCK
> and BPF_CGROUP_INET_SOCK_CREATE combination since this is the very first
> combination in kernel that relied on optional expected_attach_type -- that
> would make more sense IMO. And clarifying comment as mentioned above :)

Sure, sounds good, I'll use this combination instead.

>
> > > > +     attr.insns = insns;
> > > > +     attr.insns_cnt = ARRAY_SIZE(insns);
> > > > +     attr.license = "GPL";
> > > > +
> > > > +     fd = bpf_load_program_xattr(&attr, NULL, 0);
> > > > +     if (fd >= 0) {
> > > > +             obj->caps.exp_attach_type = 1;
> > > > +             close(fd);
> > > > +             return 1;
> > > > +     }
> > > > +     return 0;
> > > > +}
> > > > +
> > > >  static int
> > > >  bpf_object__probe_caps(struct bpf_object *obj)
> > > >  {
> > > > @@ -3325,6 +3370,7 @@ bpf_object__probe_caps(struct bpf_object *obj)
> > > >               bpf_object__probe_btf_func_global,
> > > >               bpf_object__probe_btf_datasec,
> > > >               bpf_object__probe_array_mmap,
> > > > +             bpf_object__probe_exp_attach_type,
> > > >       };
> > > >       int i, ret;
> > > >
> > > > @@ -4861,7 +4907,13 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
> > > >
> > > >       memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
> > > >       load_attr.prog_type = prog->type;
> > > > -     load_attr.expected_attach_type = prog->expected_attach_type;
> > > > +     /* old kernels might not support specifying expected_attach_type */
> > > > +     if (!prog->caps->exp_attach_type && prog->sec_def &&
> > > > +         prog->sec_def->is_exp_attach_type_optional)
> > > > +             load_attr.expected_attach_type = 0;
> > > > +     else
> > > > +             load_attr.expected_attach_type = prog->expected_attach_type;
> > >
> > > I'm having a hard time checking whether it'll work for all cases or may
> > > not work for some combination of prog/attach type and kernel version
> > > since there are many subtleties.
> > >
> > > For example BPF_PROG_TYPE_CGROUP_SOCK has both a hook where
> > > expected_attach_type is optional (BPF_CGROUP_INET_SOCK_CREATE) and hooks
> > > where it's required (BPF_CGROUP_INET{4,6}_POST_BIND), and there
> > > bpf_prog_load_fixup_attach_type() function in always sets
> > > expected_attach_type if it's not yet.
> >
> > Right, so we use the fact that they are allowed, even if optional.
> > Libbpf should provide correct expected_attach_type, according to
> > section definitions and kernel should be happy (unless user specified
> > wrong section name, of course, but we can't help that).
> >
> > >
> > > But I don't have context on all hooks that can be affected by this
> > > change and could easily miss something.
> > >
> > > Ideally it should be verified by tests. Current section_names.c test
> > > only verifies what will be returned, but AFAIK there is no test that
> > > checks whether provided combination of prog_type/expected_attach_type at
> > > load time and attach_type at attach time would actually work both on
> > > current and old kernels. Do you think it's possible to add such a
> > > selftest? (current libbpf CI supports running on old kernels, doesn't
> > > it?)
> >
> > So all the existing selftests are essentially verifying this, if run
> > on old kernel. I don't think libbpf currently runs tests on such old
> > kernels, though. But there is no extra selftest that we need to add,
> > because every single existing one will execute this piece of libbpf
> > logic.
>
> Apparently existing tests didn't catch the very obvious bug with
> BPF_PROG_TYPE_CGROUP_SOCK / BPF_CGROUP_INET_EGRESS invalid combination.

Sigh.. yeah. I expected cgroup_link test to fail if that functionality
didn't work, but I missed that bpf_program__attach_cgroup() code will
use correct expected_attach_type, even if it's not provided to
BPF_PROG_LOAD.

>
> I think it'd be useful to start with at least basic test focused on
> expected_attach_type. Then later extend it to new attach types when they're
> being added and, ideally, to existing ones.

How this test should look like? I can make a test that will work only
on new kernel (e.g., by using cgroup program which needs
expected_attach_type), but it will fail on old kernels. There doesn't
seem to be a way to query expected_attach_type from kernel. Any hints
on how to make test that will pass on old and new kernels and will
validate expected_attach_type is passed properly?

>
>
> > > > +     pr_warn("prog %s exp_Attach %d\n", prog->name, load_attr.expected_attach_type);
> > > >       if (prog->caps->name)
> > > >               load_attr.name = prog->name;
> > > >       load_attr.insns = insns;
> > > > @@ -5062,6 +5114,8 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level)
> > > >       return 0;
> > > >  }
> > > >
> >
> > trimming irrelevant parts is good ;)
>
> ack
>
> --
> Andrey Ignatov