Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF

Stanislav Fomichev <sdf@xxxxxxxxxxx> · Mon, 3 Jun 2019 18:02:54 -0700

On 06/03, Andrii Nakryiko wrote:
> On Mon, Jun 3, 2019 at 9:32 AM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote:
> >
> > On 05/31, Andrii Nakryiko wrote:
> > > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote:
> > > >
> > > > On 05/31, Andrii Nakryiko wrote:
> > > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > > captures type information of key and value naturally. This eliminates
> > > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > > always in sync with the key/value type.
> > > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > > (also, the compiler won't be able to help with the misspellings).
> > >
> > > I don't think it's really worse than current bpf_map_def approach. In
> > > typical scenario, there are only two fields you need to remember: type
> > > and max_entries (notice, they are called exactly the same as in
> > > bpf_map_def, so this knowledge is transferrable). Then you'll have
> > > key/value, using which you are describing both type (using field's
> > > type) and size (calculated from the type).
> > >
> > > I can relate a bit to that with bpf_map_def you can find definition
> > > and see all possible fields, but one can also find a lot of examples
> > > for new map definitions as well.
> > >
> > > One big advantage of this scheme, though, is that you get that type
> > > association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> > > with no chance of having a mismatch, etc. This is less duplication (no
> > > need to do sizeof(struct my_struct) and struct my_struct as an arg to
> > > that macro) and there is no need to go and ping people to add those
> > > annotations to improve introspection of BPF maps.
> > Don't get me wrong, it looks good and there are advantages compared to
> > the existing way. But, again, feels to me a bit too magic. We should somehow
> > make it less magic (see below).
> >
> > > > I don't know how others feel about it, but I'd be much more comfortable
> > > > with a simpler TLV-like approach. Have a new section where the format
> > > > is |4-byte size|struct bpf_map_def_extendable|. That would essentially
> > > > allow us to extend it the way we do with a syscall args.
> > >
> > > It would help with extensibility, sure, though even current
> > > bpf_map_def approach sort of can be extended already. But it won't
> > > solve the problem of having BTF types captured for key/value (see
> > > above). Also, you'd need another macro to lay everything out properly.
> > I didn't know that we look into the list of exported symbols to estimate
> > the number of maps and then use it to derive struct bpf_map_def size.
> >
> > In that case, maybe we can keep extending struct bpf_map_def
> > and support BTF mode as a better alternative? bpf_map_def could be
> > used as a reference for which fields there are, people can still use it
> > (with BPF_ANNOTATE_KV_PAIR if needed), but they can also use
> > new BTF mode if they find that works better for them?
> >
> > Because the biggest issue for me with the BTF mode is the question
> > of where to look for the supported fields (and misspellings). People
> > on this mailing list can probably figure it out, but people who don't
> > work full time on bpf might find it hard. Having 'struct bpf_map_def'
> > as a reference (or a good supported piece of documentation) might help
> 
> So yeah, it's more about documentation and examples, it seems, rather
> than having a C struct in code, right? Today, if I need to add new
> map, I copy/paste either from example, existing code or look up
Well, you know where to copy paste from ;-)

> documentation. You'll be able to do the same with new way (just grep
> for \.maps).
Yes, it's mostly about discoverability. Either documentation or
the real underlaying structure could help with that.

> > with that.
> >
> > What do you think? The only issue is that we now have two formats
> > to support :-/
> 
> We'll have to support existing bpf_map_def for backwards compatibility
> (and see my reply to Jakub, you can just plain re-use struct
> bpf_map_def today with BTF approach, just put it into .maps section),
> but I'd love to avoid having to support new features using two
> different way, so if we go with BTF, I'd restrict new features to BTF
> only, moving forward.
But what's wrong with trying to extend bpf_map_def for a while? It looks like
we have everything in place to do that. I understand your desire
to deprecate everything and move on, but when was BTF support added to
LLVM? 8.0.0? 8.0.1? Six months ago? Is there a major distro with the
latest llvm+btf? Do we want to lock everyone out of new libbpf features?
(Consider that a lot of people run on the LTS kernels).

What's wrong with having BTF be just a syntactic sugar on top of
bpf_map_def? One major use-case is supporting iproute2 features,
but some of those features can go into bpf_map_def as well and
be used by non-BTF enabled users.

One other point to consider here might be pure Go libbpf that Lorenz is
maintaining. Having simple underlying bpf_map_def which we can agree
on might be beneficial.

> > > > Also, (un)related: we don't currently use BTF internally, so if
> > > > you convert all tests, we'd be unable to run them :-(
> > >
> > > Not exactly sure what you mean "you'd be unable to run them". Do you
> > > mean that you use old Clang that doesn't emit BTF? If that's what you
> > > are saying, a lot of tests already rely on latest Clang, so those
> > > tests already don't work for you, probably. I'll leave it up to Daniel
> > > and Alexei to decide if we want to convert selftests right now or not.
> > > I did it mostly to prove that we can handle all existing cases (and
> > > found few gotchas and bugs along the way, both in my implementation
> > > and in kernel - fixes coming soon).
> > Yes, I mean that we don't always use the latest features of clang,
> > so having the existing tests in the old form (at least for a while)
> > would be appreciated. Good candidates to showcase new format can
> > be features that explicitly require BTF, stuff like spinlocks.
> 
> I totally understand a concern, but I'll still defer to maintainers to
> make a call as to when to do conversion.
Sure, totally up to you and the maintainers. Just raising my voice,
so you'd at least consider not converting everything.

> > > > > Relying on BTF, this approach allows for both forward and backward
> > > > > compatibility w.r.t. extending supported map definition features. Old
> > > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > > implementations will parse and recognize new optional attributes.
> > > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > > In the kernel we require that the unknown fields are zeroed.
> > > > We probably need to do something like that here? What do you think
> > > > would be a good example of an optional attribute?
> > >
> > > Ignoring is required for forward-compatibility, where old libbpf will
> > > be used to load newer user BPF programs. We can decided not to do it,
> > > in that case it's just a question of erroring out on first unknown
> > > field. This RFC was posted exactly to discuss all these issues with
> > > more general community, as there is no single true way to do this.
> > >
> > > As for examples of when it can be used. It's any feature that can be
> > > considered optional or a hint, so if old libbpf doesn't do that, it's
> > > still not the end of the world (and we can live with that, or can
> > > correct using direct libbpf API calls).
> > In general, doing what we do right now with bpf_map_def (returning an error
> > for non-zero unknown options) seems like the safest option. We should
> > probably do the same with the unknown BTF fields (return an error
> > for non-zero value).
> 
> Yeah, as I replied to Jakub, libbpf already has strict/non-strict
> mode, we should probably do the same. The only potential difference is
> that there is no need to check for zeros and stuff: just don't define
> a field. And using an extra flag, we can allow more relaxed semantics
> (just debug/info/warn message on unknown fields). This is what
> __bpf_object__open_xattr does today with MAPS_RELAX_COMPAT flag.
> 
> >
> > For a general BTF case, we can have some predefined policy: if, for example,
> > the field name starts with an underscore, it's optional and doesn't require
> > non-zero check. (or the name ends with '_opt' or some other clear policy).
> >
> > > > > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > > > > 1. All the maps should be defined in .maps ELF section. It's possible to
> > > > >    have both "legacy" map definitions in `maps` sections and BTF-defined
> > > > >    maps in .maps sections. Everything will still work transparently.
> > > > > 2. The map declaration and initialization is done through
> > > > >    a global/static variable of a struct type with few mandatory and
> > > > >    extra optional fields:
> > > > >    - type field is mandatory and specified type of BPF map;
> > > > >    - key/value fields are mandatory and capture key/value type/size information;
> > > > >    - max_entries attribute is optional; if max_entries is not specified or
> > > > >      initialized, it has to be provided in runtime through libbpf API
> > > > >      before loading bpf_object;
> > > > >    - map_flags is optional and if not defined, will be assumed to be 0.
> > > > > 3. Key/value fields should be **a pointer** to a type describing
> > > > >    key/value. The pointee type is assumed (and will be recorded as such
> > > > >    and used for size determination) to be a type describing key/value of
> > > > >    the map. This is done to save excessive amounts of space allocated in
> > > > >    corresponding ELF sections for key/value of big size.
> > > > > 4. As some maps disallow having BTF type ID associated with key/value,
> > > > >    it's possible to specify key/value size explicitly without
> > > > >    associating BTF type ID with it. Use key_size and value_size fields
> > > > >    to do that (see example below).
> > > > >
> > > > > Here's an example of simple ARRAY map defintion:
> > > > >
> > > > > struct my_value { int x, y, z; };
> > > > >
> > > > > struct {
> > > > >       int type;
> > > > >       int max_entries;
> > > > >       int *key;
> > > > >       struct my_value *value;
> > > > > } btf_map SEC(".maps") = {
> > > > >       .type = BPF_MAP_TYPE_ARRAY,
> > > > >       .max_entries = 16,
> > > > > };
> > > > >
> > > > > This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> > > > > be of type int and thus key size will be 4 bytes. The value is struct
> > > > > my_value of size 12 bytes. This map can be used from C code exactly the
> > > > > same as with existing maps defined through struct bpf_map_def.
> > > > >
> > > > > Here's an example of STACKMAP definition (which currently disallows BTF type
> > > > > IDs for key/value):
> > > > >
> > > > > struct {
> > > > >       __u32 type;
> > > > >       __u32 max_entries;
> > > > >       __u32 map_flags;
> > > > >       __u32 key_size;
> > > > >       __u32 value_size;
> > > > > } stackmap SEC(".maps") = {
> > > > >       .type = BPF_MAP_TYPE_STACK_TRACE,
> > > > >       .max_entries = 128,
> > > > >       .map_flags = BPF_F_STACK_BUILD_ID,
> > > > >       .key_size = sizeof(__u32),
> > > > >       .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> > > > > };
> > > > >
> > > > > This approach is naturally extended to support map-in-map, by making a value
> > > > > field to be another struct that describes inner map. This feature is not
> > > > > implemented yet. It's also possible to incrementally add features like pinning
> > > > > with full backwards and forward compatibility.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
> > > > > ---
> > > > >  tools/lib/bpf/btf.h    |   1 +
> > > > >  tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
> > > > >  2 files changed, 325 insertions(+), 9 deletions(-)