On Mon, 3 Jun 2019 14:54:53 -0700, Andrii Nakryiko wrote: > On Sun, Jun 2, 2019 at 5:33 PM Jakub Kicinski wrote: > > On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote: > > > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote: > > > > On 05/31, Andrii Nakryiko wrote: > > > > > This patch adds support for a new way to define BPF maps. It relies on > > > > > BTF to describe mandatory and optional attributes of a map, as well as > > > > > captures type information of key and value naturally. This eliminates > > > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are > > > > > always in sync with the key/value type. > > > > My 2c: this is too magical and relies on me knowing the expected fields. > > > > (also, the compiler won't be able to help with the misspellings). > > > > I have mixed feelings, too. Especially the key and value fields are > > very non-idiomatic for C :( They never hold any value or data, while > > the other fields do. That feels so awkward. I'm no compiler expert, > > but even something like: > > > > struct map_def { > > void *key_type_ref; > > } mamap = { > > .key_type_ref = &(struct key_xyz){}, > > }; > > > > Would feel like less of a hack to me, and then map_def doesn't have to > > be different for every map. But yea, IDK if it's easy to (a) resolve > > the type of what key_type points to, or (b) how to do this for scalar > > types. > > The syntax for scalar would be &(int){0}, that compiles. > > But there are a bunch of things that make it infeasible. So let's take > an example and see what's happening: > > /* huge struct */ > struct custom {int a; int b; int c; int d[1000000];}; > > struct { > void *key; > void *value; > } new_map = { > .key = &(int){0}, > .value = &(struct custom){}, > }; > > If we dump BTF, here's what we get: > > $ bpftool btf dump file tail_call_test.o > [1] FUNC_PROTO '(anon)' ret_type_id=2 vlen=0 > [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > [3] FUNC 'main' type_id=1 > [4] VAR '.compoundliteral' type_id=0, linkage=static > [5] VAR '.compoundliteral.1' type_id=0, linkage=static > [6] STRUCT '(anon)' size=24 vlen=3 > 'type' type_id=2 bits_offset=0 > 'key' type_id=7 bits_offset=64 > 'value' type_id=7 bits_offset=128 > [7] PTR '(anon)' type_id=0 > [8] VAR 'new_map' type_id=6, linkage=global-alloc > [9] DATASEC '.bss' size=0 vlen=2 > type_id=4 offset=0 size=4 > type_id=5 offset=4 size=4000012 > [10] DATASEC '.maps' size=0 vlen=1 > type_id=8 offset=0 size=24 > > So notice how we get two .bss entries, one for 4 bytes (for key, var > 'compoundliteral') and another for 4MB (for huge struct, var > '.compoundliteral.1'). So while this won't increase the size of ELF, > it will force a huge .bss (and corresponding global data map) to be > created, which is no good. > > Also, notice how there is no type information associated with [4] and > [5] vars, they are just of type void. There is no type information > about struct custom at all, though it might be (?) possible to fix it > by modifying compiler to preserve more type information. > > So while the second one is a technical hurdle, which we might overcome > (not sure, actually), the issue with big .BSS is a showstopper for > some applications. Ah :/ > To eliminate .BSS issue, we'd need something like this to capture type > information: Well, or we can track that the part of BSS is only referenced from map def, but that's hairy as well. > struct { > void *key; > void *value; > } new_map = { > .key = (int)0, > .value = (struct custom *)0, > }; > > But that doesn't capture any type information for those type casts at > all, so more compiler work (if at all possible). > > Which is why I think capturing type information using a standard > non-convoluted C way/syntax using a field declaration is the most > reliable, simple, and clean way. You do intialize key/value, it's just > a NULL pointer to corresponding type: > > struct { > int type; > int *key; > struct custom *value; > } new_map __attribute__((section(".maps"), used)) = { > .type = 2, > .key = (int)0, > .value = (struct custom *)NULL, > }; > > > Notice, btw, that this approach doesn't prevent you to re-use struct > definitions for multiple maps, if they have identical key/value types > or if you are not capturing type information at all. > > struct my_typical_map { > int type; > int max_entries; > u64 *key; > struct custom *value; > }; > > struct my_typical_map map1 SEC(".maps") = { > .type = BPF_MAP_TYPE_ARRAY, > .max_entries = 10, > }; > > struct my_typical_map map2 SEC(".maps") = { > .type = BPF_MAP_TYPE_ARRAY, > .max_entries = 20, > }; > > Or, you can just re-use struct bpf_map_def today like this (but you > won't have type info for key/value, of course): > > struct bpf_map_def my_map_without_type_info SEC(".maps") = { > .type = BPF_MAP_TYPE_ARRAY, > .max_entries = 100, > .key_size = sizeof(u64), > .value_size = sizeof(struct custom), > }; > > This approach gives you as much flexibility as possible, you only will > have to have different definition struct, if you have different > key/value type (in C++ that would be solved by templates, but alas we > are in C land). To be clear I'm not arguing that the proposal is not flexible. To a C guy like me having struct members which don't hold value mixed with struct members for storing data seems very dirty, dare I say the BPF_ANNOTATE_KV_PAIR() construct feels cleaner :( I like that it has the "meta structure" separate from the actual def structure. In a sense this would feel better to me: BPF_DECLARE_KV_PAIR(kv_ref, struct key, struct val); struct { .. } def = { .btf_ref = &kv_ref, }; BPF_DECLARE_KV_PAIR() can stuff the structure into a custom ignored section. Initially I was thinking to try to force a relocation to avoid the BSS issue: extern struct key_ext; extern struct value_ext; struct def { void *key; void *value; } new_map = { .key = &key_ext, .value = &value_ext, }; But then I'm not super happy with needed the extern declarations :( IDK, I would really like a cleaner solution, but perhaps there is none ;) > > > > > Relying on BTF, this approach allows for both forward and backward > > > > > compatibility w.r.t. extending supported map definition features. Old > > > > > libbpf implementation will ignore fields it doesn't recognize, while new > > > > > implementations will parse and recognize new optional attributes. > > > > I also don't know how to feel about old libbpf ignoring some attributes. > > > > In the kernel we require that the unknown fields are zeroed. > > > > We probably need to do something like that here? What do you think > > > > would be a good example of an optional attribute? > > > > > > Ignoring is required for forward-compatibility, where old libbpf will > > > be used to load newer user BPF programs. We can decided not to do it, > > > in that case it's just a question of erroring out on first unknown > > > field. This RFC was posted exactly to discuss all these issues with > > > more general community, as there is no single true way to do this. > > > > > > As for examples of when it can be used. It's any feature that can be > > > considered optional or a hint, so if old libbpf doesn't do that, it's > > > still not the end of the world (and we can live with that, or can > > > correct using direct libbpf API calls). > > > > On forward compatibility my 0.02c would be - if we want to go there > > and silently ignore fields it'd be good to have some form of "hard > > required" bit. For TLVs ABIs it can be a "you have to understand > > this one" bit, for libbpf perhaps we could add a "min libbpf version > > required" section? That kind of ties us ELF formats to libbpf > > specifics (the libbpf version presumably would imply support for > > features), but I think we want to go there, anyway. > > I think we can go with strict/non-strict mode, which we already > support in libbpf with MAPS_RELAX_COMPAT flag (see > __bpf_object__open_xattr), would that work? I'd be a lil worried that all or nothing may not be flexible enough. IOW someone may need a future from 5.5 but optionally want a 5.10 features if available? No strong feelings, tho.