Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 3 Jun 2019 14:54:53 -0700

On Sun, Jun 2, 2019 at 5:33 PM Jakub Kicinski
<jakub.kicinski@xxxxxxxxxxxxx> wrote:
>
> On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote:
> > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote:
> > > On 05/31, Andrii Nakryiko wrote:
> > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > captures type information of key and value naturally. This eliminates
> > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > always in sync with the key/value type.
> > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > (also, the compiler won't be able to help with the misspellings).
>
> I have mixed feelings, too.  Especially the key and value fields are
> very non-idiomatic for C :(  They never hold any value or data, while
> the other fields do.  That feels so awkward.  I'm no compiler expert,
> but even something like:
>
> struct map_def {
>         void *key_type_ref;
> } mamap = {
>         .key_type_ref = &(struct key_xyz){},
> };
>
> Would feel like less of a hack to me, and then map_def doesn't have to
> be different for every map.  But yea, IDK if it's easy to (a) resolve
> the type of what key_type points to, or (b) how to do this for scalar
> types.

The syntax for scalar would be &(int){0}, that compiles.

But there are a bunch of things that make it infeasible. So let's take
an example and see what's happening:

/* huge struct */
struct custom {int a; int b; int c; int d[1000000];};

struct {
        void *key;
        void *value;
} new_map = {
        .key = &(int){0},
        .value = &(struct custom){},
};

If we dump BTF, here's what we get:

$ bpftool btf dump file tail_call_test.o
[1] FUNC_PROTO '(anon)' ret_type_id=2 vlen=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] FUNC 'main' type_id=1
[4] VAR '.compoundliteral' type_id=0, linkage=static
[5] VAR '.compoundliteral.1' type_id=0, linkage=static
[6] STRUCT '(anon)' size=24 vlen=3
        'type' type_id=2 bits_offset=0
        'key' type_id=7 bits_offset=64
        'value' type_id=7 bits_offset=128
[7] PTR '(anon)' type_id=0
[8] VAR 'new_map' type_id=6, linkage=global-alloc
[9] DATASEC '.bss' size=0 vlen=2
        type_id=4 offset=0 size=4
        type_id=5 offset=4 size=4000012
[10] DATASEC '.maps' size=0 vlen=1
        type_id=8 offset=0 size=24

So notice how we get two .bss entries, one for 4 bytes (for key, var
'compoundliteral') and another for 4MB (for huge struct, var
'.compoundliteral.1'). So while this won't increase the size of ELF,
it will force a huge .bss (and corresponding global data map) to be
created, which is no good.

Also, notice how there is no type information associated with [4] and
[5] vars, they are just of type void. There is no type information
about struct custom at all, though it might be (?) possible to fix it
by modifying compiler to preserve more type information.

So while the second one is a technical hurdle, which we might overcome
(not sure, actually), the issue with big .BSS is a showstopper for
some applications.

To eliminate .BSS issue, we'd need something like this to capture type
information:

struct {
        void *key;
        void *value;
} new_map = {
        .key = (int)0,
        .value = (struct custom *)0,
};

But that doesn't capture any type information for those type casts at
all, so more compiler work (if at all possible).

Which is why I think capturing type information using a standard
non-convoluted C way/syntax using a field declaration is the most
reliable, simple, and clean way. You do intialize key/value, it's just
a NULL pointer to corresponding type:

struct {
        int type;
        int *key;
        struct custom *value;
} new_map __attribute__((section(".maps"), used)) = {
        .type = 2,
        .key = (int)0,
        .value = (struct custom *)NULL,
};

Notice, btw, that this approach doesn't prevent you to re-use struct
definitions for multiple maps, if they have identical key/value types
or if you are not capturing type information at all.

struct my_typical_map {
        int type;
        int max_entries;
        u64 *key;
        struct custom *value;
};

struct my_typical_map map1 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 10,
};

struct my_typical_map map2 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 20,
};

Or, you can just re-use struct bpf_map_def today like this (but you
won't have type info for key/value, of course):

struct bpf_map_def my_map_without_type_info SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 100,
        .key_size = sizeof(u64),
        .value_size = sizeof(struct custom),
};

This approach gives you as much flexibility as possible, you only will
have to have different definition struct, if you have different
key/value type (in C++ that would be solved by templates, but alas we
are in C land).

>
> > I don't think it's really worse than current bpf_map_def approach. In
> > typical scenario, there are only two fields you need to remember: type
> > and max_entries (notice, they are called exactly the same as in
> > bpf_map_def, so this knowledge is transferrable). Then you'll have
> > key/value, using which you are describing both type (using field's
> > type) and size (calculated from the type).
> >
> > I can relate a bit to that with bpf_map_def you can find definition
> > and see all possible fields, but one can also find a lot of examples
> > for new map definitions as well.
> >
> > One big advantage of this scheme, though, is that you get that type
> > association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> > with no chance of having a mismatch, etc. This is less duplication (no
> > need to do sizeof(struct my_struct) and struct my_struct as an arg to
> > that macro) and there is no need to go and ping people to add those
> > annotations to improve introspection of BPF maps.
>
> > > > Relying on BTF, this approach allows for both forward and backward
> > > > compatibility w.r.t. extending supported map definition features. Old
> > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > implementations will parse and recognize new optional attributes.
> > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > In the kernel we require that the unknown fields are zeroed.
> > > We probably need to do something like that here? What do you think
> > > would be a good example of an optional attribute?
> >
> > Ignoring is required for forward-compatibility, where old libbpf will
> > be used to load newer user BPF programs. We can decided not to do it,
> > in that case it's just a question of erroring out on first unknown
> > field. This RFC was posted exactly to discuss all these issues with
> > more general community, as there is no single true way to do this.
> >
> > As for examples of when it can be used. It's any feature that can be
> > considered optional or a hint, so if old libbpf doesn't do that, it's
> > still not the end of the world (and we can live with that, or can
> > correct using direct libbpf API calls).
>
> On forward compatibility my 0.02c would be - if we want to go there
> and silently ignore fields it'd be good to have some form of "hard
> required" bit.  For TLVs ABIs it can be a "you have to understand
> this one" bit, for libbpf perhaps we could add a "min libbpf version
> required" section?  That kind of ties us ELF formats to libbpf
> specifics (the libbpf version presumably would imply support for
> features), but I think we want to go there, anyway.

I think we can go with strict/non-strict mode, which we already
support in libbpf with MAPS_RELAX_COMPAT flag (see
__bpf_object__open_xattr), would that work?