explicit maps. Was: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF

Alexei Starovoitov <ast@xxxxxx> · Mon, 10 Jun 2019 01:17:13 +0000

On 6/6/19 6:02 PM, Jakub Kicinski wrote:
> On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote:
>> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
>> That hack must go.
> 
> I see.
> 
>> If I understood your objections to Andrii's format is that
>> you don't like pointer part of key/value while Andrii explained
>> why we picked the pointer, right?
>>
>> So how about:
>>
>> struct {
>>     int type;
>>     int max_entries;
>>     struct {
>>       __u32 key;
>>       struct my_value value;
>>     } types[];
>> } ...
> 
> My objection is that k/v fields are never initialized, so they're
> "metafields", mixed with real fields which hold parameters - like
> type, max_entries etc.

I don't share this meta fields vs real fields distinction.
All of the fields are meta.
Kernel implementation of the map doesn't need to hold type and
max_entries as actual configuration fields.
The map definition in c++ would have looked like:
bpf::hash_map<int, struct my_value, 1000, NO_PREALLOC> foo;
bpf::array_map<struct my_value, 2000> bar;

Sometime key is not necessary. Sometimes flags have to be zero.
bpf syscall api is a superset of all fiels for all maps.
All of them are configuration and meta fields at the same time.
In c++ example there is really no difference between
'struct my_value' and '1000' attributes.

I'm pretty sure bpf will have C++ front-end in the future,
but until then we have to deal with C and, I think, the map
definition should be the most natural C syntax.
In that sense what you're proposing with extern:
> extern struct my_key my_key;
> extern int type_int;
> 
> struct map_def {
>      int type;
>      int max_entries;
>      void *btf_key_ref;
>      void *btf_val_ref;
> } = {
>      ...
>      .btf_key_ref = &my_key,
>      .btf_val_ref = &type_int,
> };

is worse than

struct map_def {
       int type;
       int max_entries;
       int btf_key;
       struct my_key btf_value;
};

imo explicit key and value would be ideal,
but they take too much space. Hence pointers
or zero sized array:
struct {
      int type;
      int max_entries;
      struct {
        __u32 key;
        struct my_value value;
      } types[];
};

I think we should also consider explicit map creation.

Something like:

struct my_map {
   __u32 key;
   struct my_value value;
} *my_hash_map, *my_pinned_hash_map;

struct {
    __u64 key;
   struct my_map *value;
} *my_hash_of_maps;

struct {
   struct my_map *value;
} *my_array_of_maps;

__init void create_my_maps(void)
{
   bpf_create_hash_map(&my_hash_map, 1000/*max_entries*/);
   bpf_obj_get(&my_pinned_hash_map, "/sys/fs/bpf/my_map");
   bpf_create_hash_of_maps(&my_hash_of_maps, 1000/*max_entries*/);
   bpf_create_array_of_maps(&my_array_of_maps, 20);
}

SEC("cgroup/skb")
int bpf_prog(struct __sk_buff *skb)
{
   struct my_value *val;
   __u32 key;
   __u64 key64;
   struct my_map *map;

   val = bpf_map_lookup(my_hash_map, &key);
   map = bpf_map_lookup(my_hash_of_maps, &key64);
}

'__init' section will be compiled by llvm into bpf instructions
that will be executed in users space by libbpf.
The __init prog has to succeed otherwise prog load fails.

May be all map pointers should be in a special section to avoid
putting them into datasec, but libbpf should be able to figure that
out without requiring user to specify the .map section.
The rest of global vars would go into special datasec map.

No llvm changes necessary and BTF is available for keys and values.

libbpf can start with simple __init and eventually grow into
complex init procedure where maps are initialized,
prog_array is populated, etc.

Thoughts?