Re: [PATCH bpf-next] libbpf: always specify expected_attach_type on program load if supported

Andrey Ignatov <rdna@xxxxxx> · Mon, 13 Apr 2020 15:44:12 -0700

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> [Mon, 2020-04-13 15:00 -0700]:
> On Mon, Apr 13, 2020 at 1:21 PM Andrey Ignatov <rdna@xxxxxx> wrote:
> >
> > Andrii Nakryiko <andriin@xxxxxx> [Sat, 2020-04-11 22:58 -0700]:
> > > For some types of BPF programs that utilize expected_attach_type, libbpf won't
> > > set load_attr.expected_attach_type, even if expected_attach_type is known from
> > > section definition. This was done to preserve backwards compatibility with old
> > > kernels that didn't recognize expected_attach_type attribute yet (which was
> > > added in 5e43f899b03a ("bpf: Check attach type at prog load time"). But this
> > > is problematic for some BPF programs that utilize never features that require
> > > kernel to know specific expected_attach_type (e.g., extended set of return
> > > codes for cgroup_skb/egress programs).
> > >
> > > This patch makes libbpf specify expected_attach_type by default, but also
> > > detect support for this field in kernel and not set it during program load.
> > > This allows to have a good metadata for bpf_program
> > > (e.g., bpf_program__get_extected_attach_type()), but still work with old
> > > kernels (for cases where it can work at all).
> > >
> > > Additionally, due to expected_attach_type being always set for recognized
> > > program types, bpf_program__attach_cgroup doesn't have to do extra checks to
> > > determine correct attach type, so remove that additional logic.
> > >
> > > Also adjust section_names selftest to account for this change.
> > >
> > > More detailed discussion can be found in [0].
> > >
> > >   [0] https://lore.kernel.org/bpf/20200412003604.GA15986@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > >
> > > Reported-by: Andrey Ignatov <rdna@xxxxxx>
> > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
> > > ---
> > >  tools/lib/bpf/libbpf.c                        | 123 +++++++++++-------
> > >  .../selftests/bpf/prog_tests/section_names.c  |  42 +++---
> > >  2 files changed, 106 insertions(+), 59 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index ff9174282a8c..925f720deea0 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -178,6 +178,8 @@ struct bpf_capabilities {
> > >       __u32 array_mmap:1;
> > >       /* BTF_FUNC_GLOBAL is supported */
> > >       __u32 btf_func_global:1;
> > > +     /* kernel support for expected_attach_type in BPF_PROG_LOAD */
> > > +     __u32 exp_attach_type:1;
> > >  };
> > >
> > >  enum reloc_type {
> > > @@ -194,6 +196,22 @@ struct reloc_desc {
> > >       int sym_off;
> > >  };
> > >
> > > +struct bpf_sec_def;
> > > +
> > > +typedef struct bpf_link *(*attach_fn_t)(const struct bpf_sec_def *sec,
> > > +                                     struct bpf_program *prog);
> > > +
> > > +struct bpf_sec_def {
> > > +     const char *sec;
> > > +     size_t len;
> > > +     enum bpf_prog_type prog_type;
> > > +     enum bpf_attach_type expected_attach_type;
> > > +     bool is_exp_attach_type_optional;
> > > +     bool is_attachable;
> > > +     bool is_attach_btf;
> > > +     attach_fn_t attach_fn;
> > > +};
> > > +
> > >  /*
> > >   * bpf_prog should be a better name but it has been used in
> > >   * linux/filter.h.
> > > @@ -204,6 +222,7 @@ struct bpf_program {
> > >       char *name;
> > >       int prog_ifindex;
> > >       char *section_name;
> > > +     const struct bpf_sec_def *sec_def;
> > >       /* section_name with / replaced by _; makes recursive pinning
> > >        * in bpf_object__pin_programs easier
> > >        */
> > > @@ -3315,6 +3334,32 @@ static int bpf_object__probe_array_mmap(struct bpf_object *obj)
> > >       return 0;
> > >  }
> > >
> > > +static int
> > > +bpf_object__probe_exp_attach_type(struct bpf_object *obj)
> > > +{
> > > +     struct bpf_load_program_attr attr;
> > > +     struct bpf_insn insns[] = {
> > > +             BPF_MOV64_IMM(BPF_REG_0, 0),
> > > +             BPF_EXIT_INSN(),
> > > +     };
> > > +     int fd;
> > > +
> > > +     memset(&attr, 0, sizeof(attr));
> > > +     attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
> > > +     attr.expected_attach_type = BPF_CGROUP_INET_EGRESS;
> >
> > Could you clarify semantics of this function please?
> >
> > According to the name it looks like it should check whether
> > expected_attach_type attribute is supported or not. But
> > BPF_CGROUP_INET_EGRESS doesn't align with this since
> > expected_attach_type itself was added long before it was supported for
> > BPF_CGROUP_INET_EGRESS.
> >
> > For example 4fbac77d2d09 ("bpf: Hooks for sys_bind") added in Mar 2018 is
> > the first hook ever that used expected_attach_type.
> >
> > aac3fc320d94 ("bpf: Post-hooks for sys_bind") added a bit later is the
> > first hook that made expected_attach_type optional (for
> > BPF_CGROUP_INET_SOCK_CREATE).
> >
> > But 5cf1e9145630 ("bpf: cgroup inet skb programs can return 0 to 3") for
> > BPF_CGROUP_INET_EGRESS was merged more than a year after the previous
> > two.
> 
> I'm checking if kernel is rejecting non-zero expected_attach_type
> field in bpf_attr for BPF_PROG_LOAD command.
> 
> Before 5e43f899b03a ("bpf: Check attach type at prog load time"),
> kernel would reject non-zero expected_attach_type because
> expected_attach_type didn't exist in bpf_attr. So if that's the case,
> we shouldn't specify expected_attach_type.
> 
> After that commit, BPF_CGROUP_INET_EGRESS for
> BPF_PROG_TYPE_CGROUP_SOCK would be supported, even if it is optional,
> so using that combination should work.
> 
> Did I miss something?

So you're saying that there is nothing special about
BPF_CGROUP_INET_EGRESS, you just need _any_ attach type and it just
happened so that you used this one.

That sounds fine, but could you clarify it with a comment please,
otherwise it looks confusing: "why BPF_CGROUP_INET_EGRESS and not some
other attach type, for example any of those which got optional
expected_attach_type earlier, what's so special about
BPF_CGROUP_INET_EGRESS".

Also, I just realized that this combination: BPF_PROG_TYPE_CGROUP_SOCK and
BPF_CGROUP_INET_EGRESS is incorrect: BPF_CGROUP_INET_EGRESS attach type
corresponds to BPF_PROG_TYPE_CGROUP_SKB prog type, not to
BPF_PROG_TYPE_CGROUP_SOCK. This call will always fail with EINVAL on new
kernels, because of this code in bpf_prog_load_check_attach:

	switch (prog_type) {
	case BPF_PROG_TYPE_CGROUP_SOCK:
		switch (expected_attach_type) {
		case BPF_CGROUP_INET_SOCK_CREATE:
		case BPF_CGROUP_INET4_POST_BIND:
		case BPF_CGROUP_INET6_POST_BIND:
			return 0;
		default:
			return -EINVAL;
		}

That should be fixed.

And since this has to be changed anyway I'd go with BPF_PROG_TYPE_CGROUP_SOCK
and BPF_CGROUP_INET_SOCK_CREATE combination since this is the very first
combination in kernel that relied on optional expected_attach_type -- that
would make more sense IMO. And clarifying comment as mentioned above :)

> > > +     attr.insns = insns;
> > > +     attr.insns_cnt = ARRAY_SIZE(insns);
> > > +     attr.license = "GPL";
> > > +
> > > +     fd = bpf_load_program_xattr(&attr, NULL, 0);
> > > +     if (fd >= 0) {
> > > +             obj->caps.exp_attach_type = 1;
> > > +             close(fd);
> > > +             return 1;
> > > +     }
> > > +     return 0;
> > > +}
> > > +
> > >  static int
> > >  bpf_object__probe_caps(struct bpf_object *obj)
> > >  {
> > > @@ -3325,6 +3370,7 @@ bpf_object__probe_caps(struct bpf_object *obj)
> > >               bpf_object__probe_btf_func_global,
> > >               bpf_object__probe_btf_datasec,
> > >               bpf_object__probe_array_mmap,
> > > +             bpf_object__probe_exp_attach_type,
> > >       };
> > >       int i, ret;
> > >
> > > @@ -4861,7 +4907,13 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
> > >
> > >       memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
> > >       load_attr.prog_type = prog->type;
> > > -     load_attr.expected_attach_type = prog->expected_attach_type;
> > > +     /* old kernels might not support specifying expected_attach_type */
> > > +     if (!prog->caps->exp_attach_type && prog->sec_def &&
> > > +         prog->sec_def->is_exp_attach_type_optional)
> > > +             load_attr.expected_attach_type = 0;
> > > +     else
> > > +             load_attr.expected_attach_type = prog->expected_attach_type;
> >
> > I'm having a hard time checking whether it'll work for all cases or may
> > not work for some combination of prog/attach type and kernel version
> > since there are many subtleties.
> >
> > For example BPF_PROG_TYPE_CGROUP_SOCK has both a hook where
> > expected_attach_type is optional (BPF_CGROUP_INET_SOCK_CREATE) and hooks
> > where it's required (BPF_CGROUP_INET{4,6}_POST_BIND), and there
> > bpf_prog_load_fixup_attach_type() function in always sets
> > expected_attach_type if it's not yet.
> 
> Right, so we use the fact that they are allowed, even if optional.
> Libbpf should provide correct expected_attach_type, according to
> section definitions and kernel should be happy (unless user specified
> wrong section name, of course, but we can't help that).
> 
> >
> > But I don't have context on all hooks that can be affected by this
> > change and could easily miss something.
> >
> > Ideally it should be verified by tests. Current section_names.c test
> > only verifies what will be returned, but AFAIK there is no test that
> > checks whether provided combination of prog_type/expected_attach_type at
> > load time and attach_type at attach time would actually work both on
> > current and old kernels. Do you think it's possible to add such a
> > selftest? (current libbpf CI supports running on old kernels, doesn't
> > it?)
> 
> So all the existing selftests are essentially verifying this, if run
> on old kernel. I don't think libbpf currently runs tests on such old
> kernels, though. But there is no extra selftest that we need to add,
> because every single existing one will execute this piece of libbpf
> logic.

Apparently existing tests didn't catch the very obvious bug with
BPF_PROG_TYPE_CGROUP_SOCK / BPF_CGROUP_INET_EGRESS invalid combination.

I think it'd be useful to start with at least basic test focused on
expected_attach_type. Then later extend it to new attach types when they're
being added and, ideally, to existing ones.

> > > +     pr_warn("prog %s exp_Attach %d\n", prog->name, load_attr.expected_attach_type);
> > >       if (prog->caps->name)
> > >               load_attr.name = prog->name;
> > >       load_attr.insns = insns;
> > > @@ -5062,6 +5114,8 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level)
> > >       return 0;
> > >  }
> > >
> 
> trimming irrelevant parts is good ;)

ack

-- 
Andrey Ignatov