On Mon, 10 Jun 2019 01:17:13 +0000, Alexei Starovoitov wrote: > On 6/6/19 6:02 PM, Jakub Kicinski wrote: > > On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote: > >> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too. > >> That hack must go. > > > > I see. > > > >> If I understood your objections to Andrii's format is that > >> you don't like pointer part of key/value while Andrii explained > >> why we picked the pointer, right? > >> > >> So how about: > >> > >> struct { > >> int type; > >> int max_entries; > >> struct { > >> __u32 key; > >> struct my_value value; > >> } types[]; > >> } ... > > > > My objection is that k/v fields are never initialized, so they're > > "metafields", mixed with real fields which hold parameters - like > > type, max_entries etc. > > I don't share this meta fields vs real fields distinction. > All of the fields are meta. > Kernel implementation of the map doesn't need to hold type and > max_entries as actual configuration fields. > The map definition in c++ would have looked like: > bpf::hash_map<int, struct my_value, 1000, NO_PREALLOC> foo; > bpf::array_map<struct my_value, 2000> bar; > > Sometime key is not necessary. Sometimes flags have to be zero. > bpf syscall api is a superset of all fiels for all maps. > All of them are configuration and meta fields at the same time. > In c++ example there is really no difference between > 'struct my_value' and '1000' attributes. > > I'm pretty sure bpf will have C++ front-end in the future, > but until then we have to deal with C and, I think, the map > definition should be the most natural C syntax. > In that sense what you're proposing with extern: > > extern struct my_key my_key; > > extern int type_int; > > > > struct map_def { > > int type; > > int max_entries; > > void *btf_key_ref; > > void *btf_val_ref; > > } = { > > ... > > .btf_key_ref = &my_key, > > .btf_val_ref = &type_int, > > }; > > is worse than > > struct map_def { > int type; > int max_entries; > int btf_key; > struct my_key btf_value; > }; > > imo explicit key and value would be ideal, > but they take too much space. Hence pointers > or zero sized array: > struct { > int type; > int max_entries; > struct { > __u32 key; > struct my_value value; > } types[]; > }; It is a C syntax problem, I do agree with you that it works well for templates. The map_def structure holds parameters, and we can't take a type as a value in C. Hence the types[] in your proposal - you could as well call them ghost_fields[] :) > I think we should also consider explicit map creation. > > Something like: > > struct my_map { > __u32 key; > struct my_value value; > } *my_hash_map, *my_pinned_hash_map; > > struct { > __u64 key; > struct my_map *value; > } *my_hash_of_maps; > > struct { > struct my_map *value; > } *my_array_of_maps; > > __init void create_my_maps(void) > { > bpf_create_hash_map(&my_hash_map, 1000/*max_entries*/); > bpf_obj_get(&my_pinned_hash_map, "/sys/fs/bpf/my_map"); > bpf_create_hash_of_maps(&my_hash_of_maps, 1000/*max_entries*/); > bpf_create_array_of_maps(&my_array_of_maps, 20); > } > > SEC("cgroup/skb") > int bpf_prog(struct __sk_buff *skb) > { > struct my_value *val; > __u32 key; > __u64 key64; > struct my_map *map; > > val = bpf_map_lookup(my_hash_map, &key); > map = bpf_map_lookup(my_hash_of_maps, &key64); > } > > '__init' section will be compiled by llvm into bpf instructions > that will be executed in users space by libbpf. > The __init prog has to succeed otherwise prog load fails. > > May be all map pointers should be in a special section to avoid > putting them into datasec, but libbpf should be able to figure that > out without requiring user to specify the .map section. > The rest of global vars would go into special datasec map. > > No llvm changes necessary and BTF is available for keys and values. > > libbpf can start with simple __init and eventually grow into > complex init procedure where maps are initialized, > prog_array is populated, etc. > > Thoughts? I like it! :)