Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Fri, 1 Mar 2019 11:46:34 +0100

On 03/01/2019 07:53 AM, Andrii Nakryiko wrote:
> On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>>
>> This work adds BPF loader support for global data sections
>> to libbpf. This allows to write BPF programs in more natural
>> C-like way by being able to define global variables and const
>> data.
>>
>> Back at LPC 2018 [0] we presented a first prototype which
>> implemented support for global data sections by extending BPF
>> syscall where union bpf_attr would get additional memory/size
>> pair for each section passed during prog load in order to later
>> add this base address into the ldimm64 instruction along with
>> the user provided offset when accessing a variable. Consensus
>> from LPC was that for proper upstream support, it would be
>> more desirable to use maps instead of bpf_attr extension as
>> this would allow for introspection of these sections as well
>> as potential life updates of their content. This work follows
>> this path by taking the following steps from loader side:
>>
>>  1) In bpf_object__elf_collect() step we pick up ".data",
>>     ".rodata", and ".bss" section information.
>>
>>  2) If present, in bpf_object__init_global_maps() we create
>>     a map that corresponds to each of the present sections.
> 
> Is there any point in having .data and .bss in separate maps? I can
> only see for reasons of inspectiion from bpftool, but other than that
> isn't .bss just an optimization over .data to save space in ELF file,
> but in other regards is just another part of r/w .data section?

Hmm, I actually don't mind too much combining both of them. Had
the same thought with regards to introspection from bpftool which
was why I separated them. But combining the two into a single map
is fine actually, saves a bit of resources in kernel, and offsets
can easily be fixed up from libbpf side. Will do for v3.

>>     Given section size and access properties can differ, a
>>     single entry array map is created with value size that
>>     is corresponding to the ELF section size of .data, .bss
>>     or .rodata. In the latter case, the map is created as
>>     read-only from program side such that verifier rejects
>>     any write attempts into .rodata. In a subsequent step,
>>     for .data and .rodata sections, the section content is
>>     copied into the map through bpf_map_update_elem(). For
>>     .bss this is not necessary since array map is already
>>     zero-initialized by default.
> 
> For .rodata, ideally it would be nice to make it RDONLY from userland
> as well, except for first UPDATE. How hard is it to support that?

Right now the BPF_F_RDONLY, BPF_F_WRONLY semantics to make the
maps read-only or write-only from syscall side are that these
permissions are stored into the struct file front end (file->f_mode)
for the anon inode we use, meaning it's separated from the actual
BPF map, so you can create the map with BPF_F_RDONLY, but root
user can do BPF_MAP_GET_FD_BY_ID without the BPF_F_RDONLY and
again write into it. This design choice would require that we'd
need to add some additional infrastructure on top of this, which
would then need to enforce file->f_mode to read-only after the
first setup. I think there's simple trick we can apply to make
it read-only after setup from syscall side: we'll add a new flag
to the map, and then upon map creation libbpf sets everything
up, holds the id, closes its fd, and refetches the fd by id.
>From that point onwards any interface where you would get the
fd from the map in user space will enforce BPF_F_RDONLY behavior
for file->f_mode. Another, less hacky option could be to extend
the struct file ops we currently use for BPF maps and set a
map 'immutable' flag from there which is then enforced once all
pending operations have completed. I can look a bit into this.

>>  3) In bpf_program__collect_reloc() step, we record the
>>     corresponding map, insn index, and relocation type for
>>     the global data.
>>
>>  4) And last but not least in the actual relocation step in
>>     bpf_program__relocate(), we mark the ldimm64 instruction
>>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>     imm field the map's file descriptor is stored as similarly
>>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>     (as ldimm64 is 2-insn wide) we store the access offset
>>     into the section.
>>
>>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>     load will then store the actual target address in order
>>     to have a 'map-lookup'-free access. That is, the actual
>>     map value base address + offset. The destination register
>>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>     containing the fixed offset as reg->off and backing BPF
>>     map as reg->map_ptr. Meaning, it's treated as any other
>>     normal map value from verification side, only with
>>     efficient, direct value access instead of actual call to
>>     map lookup helper as in the typical case.
>>
>> Simple example dump of program using globals vars in each
>> section:
>>
>>   # readelf -a test_global_data.o
>>   [...]
>>   [ 6] .bss              NOBITS           0000000000000000  00000328
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [ 7] .data             PROGBITS         0000000000000000  00000328
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [ 8] .rodata           PROGBITS         0000000000000000  00000338
>>        0000000000000018  0000000000000000   A       0     0     8
>>   [...]
>>     95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>>     96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>>     97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>>     98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>>     99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>>    100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>>    101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>>   [...]
>>
>>   # bpftool prog
>>   103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>>        loaded_at 2019-02-28T02:02:35+0000  uid 0
>>        xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>>   # bpftool map show id 63
>>   63: array  name .bss  flags 0x0                      <-- .bss area, rw
>>       key 4B  value 16B  max_entries 1  memlock 4096B
>>   # bpftool map show id 64
>>   64: array  name .data  flags 0x0                     <-- .data area, rw
>>       key 4B  value 16B  max_entries 1  memlock 4096B
>>   # bpftool map show id 65
>>   65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>>       key 4B  value 24B  max_entries 1  memlock 4096B
>>
>>   # bpftool prog dump xlated id 103
>>   int load_static_data(struct __sk_buff * skb):
>>   ; int load_static_data(struct __sk_buff *skb)
>>      0: (b7) r1 = 0
>>   ; key = 0;
>>      1: (63) *(u32 *)(r10 -4) = r1
>>      2: (bf) r6 = r10
>>   ; int load_static_data(struct __sk_buff *skb)
>>      3: (07) r6 += -4
>>   ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>>      4: (18) r1 = map[id:66]
>>      6: (bf) r2 = r6
>>      7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>>      9: (b7) r4 = 0
>>     10: (85) call array_map_update_elem#99888
>>     11: (b7) r1 = 1
>>   ; key = 1;
>>     12: (63) *(u32 *)(r10 -4) = r1
>>   ; bpf_map_update_elem(&result, &key, &static_data, 0);
>>     13: (18) r1 = map[id:66]
>>     15: (bf) r2 = r6
>>     16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>>     18: (b7) r4 = 0
>>     19: (85) call array_map_update_elem#99888
>>     20: (b7) r1 = 2
>>   ; key = 2;
>>     21: (63) *(u32 *)(r10 -4) = r1
>>   ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>>     22: (18) r1 = map[id:66]
>>     24: (bf) r2 = r6
>>     25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>>     27: (b7) r4 = 0
>>     28: (85) call array_map_update_elem#99888
>>     29: (b7) r1 = 3
>>   ; key = 3;
>>     30: (63) *(u32 *)(r10 -4) = r1
>>   ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>>     31: (18) r7 = map[id:63][0]+8         <--.
>>     33: (18) r1 = map[id:66]                 |
>>     35: (bf) r2 = r6                         |
>>     36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>>     38: (b7) r4 = 0
>>     39: (85) call array_map_update_elem#99888
>>   [...]
>>
>> For now .data/.rodata/.bss maps are not exposed via API to the
>> user, but this could be done in a subsequent step.
> 
> See comment about BPF_MAP_TYPE_HEAP/BLOB map in comments to patch #1,
> it would probably make more useful API for .data/.rodata/.bss.
> 
>>
>> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
>> fail for static variables").
>>
>> Joint work with Joe Stringer.
>>
>>   [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>>       http://vger.kernel.org/lpc-bpf2018.html#session-3
>>
>> Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
>> Signed-off-by: Joe Stringer <joe@xxxxxxxxxxx>
>> ---
>>  tools/include/uapi/linux/bpf.h |  10 +-
>>  tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>>  2 files changed, 226 insertions(+), 43 deletions(-)
>>
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 8884072e1a46..04b26f59b413 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -287,7 +287,7 @@ enum bpf_attach_type {
>>
>>  #define BPF_OBJ_NAME_LEN 16U
>>
>> -/* Flags for accessing BPF object */
>> +/* Flags for accessing BPF object from syscall side. */
>>  #define BPF_F_RDONLY           (1U << 3)
>>  #define BPF_F_WRONLY           (1U << 4)
>>
>> @@ -297,6 +297,14 @@ enum bpf_attach_type {
>>  /* Zero-initialize hash function seed. This should only be used for testing. */
>>  #define BPF_F_ZERO_SEED                (1U << 6)
>>
>> +/* Flags for accessing BPF object from program side. */
>> +#define BPF_F_RDONLY_PROG      (1U << 7)
>> +#define BPF_F_WRONLY_PROG      (1U << 8)
>> +#define BPF_F_ACCESS_MASK      (BPF_F_RDONLY |         \
>> +                                BPF_F_RDONLY_PROG |    \
>> +                                BPF_F_WRONLY |         \
>> +                                BPF_F_WRONLY_PROG)
>> +
>>  /* flags for BPF_PROG_QUERY */
>>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 8f8f688f3e9b..969bc3d9f02c 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -139,6 +139,9 @@ struct bpf_program {
>>                 enum {
>>                         RELO_LD64,
>>                         RELO_CALL,
>> +                       RELO_DATA,
>> +                       RELO_RODATA,
>> +                       RELO_BSS,
> 
> All three of those are essentially the same relocations, just applied
> against different ELF sections.
> I think by having just single RELO_GLOBAL_DATA you can actually
> simplify a bunch of code below, please see corresponding comments.

Ok, sounds like a reasonable simplification, will do all well for v3.

>>                 } type;
>>                 int insn_idx;
>>                 union {
>> @@ -174,7 +177,10 @@ struct bpf_program {
>>  struct bpf_map {
>>         int fd;
>>         char *name;
>> -       size_t offset;
>> +       union {
>> +               __u32 global_type;
> 
> This could be an index into common maps array.
> 
>> +               size_t offset;
>> +       };
>>         int map_ifindex;
>>         int inner_map_fd;
>>         struct bpf_map_def def;
>> @@ -194,6 +200,8 @@ struct bpf_object {
>>         size_t nr_programs;
>>         struct bpf_map *maps;
>>         size_t nr_maps;
>> +       struct bpf_map *maps_global;
>> +       size_t nr_maps_global;
> 
> Global maps could be stored in maps, along other ones, so that we
> don't need to keep track of them separately.
> 
> Another inconvenience of having a separate array of global maps is
> that bpf_map__iter won't iterate them. I don't know if that's
> desirable behavior or not, but it probably would be nice to iterate
> over global ones as well?

My thinking was that these maps are not explicitly user specified,
so libbpf API would expose them through a different interface than
the one we have today in order to not confuse or break application
behavior which would otherwise rely on iterating / processing over
them. Separate API would retain current behavior and definitely
make this unambiguous to apps with regards to what to expect from
each of such API call.

>>         bool loaded;
>>         bool has_pseudo_calls;
>> @@ -209,6 +217,9 @@ struct bpf_object {
>>                 Elf *elf;
>>                 GElf_Ehdr ehdr;
>>                 Elf_Data *symbols;
>> +               Elf_Data *global_data;
>> +               Elf_Data *global_rodata;
>> +               Elf_Data *global_bss;
>>                 size_t strtabidx;
>>                 struct {
>>                         GElf_Shdr shdr;
>> @@ -217,6 +228,9 @@ struct bpf_object {
>>                 int nr_reloc;
>>                 int maps_shndx;
>>                 int text_shndx;
>> +               int data_shndx;
>> +               int rodata_shndx;
>> +               int bss_shndx;
>>         } efile;
>>         /*
>>          * All loaded bpf_object is linked in a list, which is
>> @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path,
>>         obj->efile.obj_buf = obj_buf;
>>         obj->efile.obj_buf_sz = obj_buf_sz;
>>         obj->efile.maps_shndx = -1;
>> +       obj->efile.data_shndx = -1;
>> +       obj->efile.rodata_shndx = -1;
>> +       obj->efile.bss_shndx = -1;
>>
>>         obj->loaded = false;
>>
>> @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
>>                 obj->efile.elf = NULL;
>>         }
>>         obj->efile.symbols = NULL;
>> +       obj->efile.global_data = NULL;
>> +       obj->efile.global_rodata = NULL;
>> +       obj->efile.global_bss = NULL;
>>
>>         zfree(&obj->efile.reloc);
>>         obj->efile.nr_reloc = 0;
>> @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>>         return 0;
>>  }
>>
>> +static int
>> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map);
>> +
>> +static int
>> +bpf_object__init_global(struct bpf_object *obj, int i, int type,
>> +                       const char *name, Elf_Data *map_data)
> 
> Instead of deducing flags and looking up for map by index, you can
> just pass struct bpf_map * directly instead of int i and provide
> flags, instead of type.

Yep, agree.

>> +{
>> +       struct bpf_map *map = &obj->maps_global[i];
>> +       struct bpf_map_def *def = &map->def;
>> +       char *cp, errmsg[STRERR_BUFSIZE];
>> +       int err, slot0 = 0;
>> +
>> +       def->type = BPF_MAP_TYPE_ARRAY;
>> +       def->key_size = sizeof(int);
>> +       def->value_size = map_data->d_size;
>> +       def->max_entries = 1;
>> +       def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0;
>> +
>> +       map->name = strdup(name);
>> +       map->global_type = type;
>> +       map->fd = bpf_object__create_map(obj, map);
>> +       if (map->fd < 0) {
>> +               err = map->fd;
>> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> +               pr_warning("failed to create map (name: '%s'): %s\n",
>> +                          map->name, cp);
>> +               goto destroy;
>> +       }
>> +
>> +       pr_debug("create map %s: fd=%d\n", map->name, map->fd);
>> +
>> +       if (type != RELO_BSS) {
>> +               err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0);
>> +               if (err < 0) {
>> +                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> +                       pr_warning("failed to update map (name: '%s'): %s\n",
>> +                                  map->name, cp);
>> +                       goto destroy;
>> +               }
>> +
>> +               pr_debug("updated map %s with elf data: fd=%d\n", map->name,
>> +                        map->fd);
>> +       }
>> +       return 0;
>> +destroy:
>> +       for (i = 0; i < obj->nr_maps_global; i++)
>> +               zclose(obj->maps_global[i].fd);
>> +       return err;
>> +}
>> +
>> +static int
>> +bpf_object__init_global_maps(struct bpf_object *obj)
>> +{
>> +       int nr_maps_global = (obj->efile.data_shndx >= 0) +
>> +                            (obj->efile.rodata_shndx >= 0) +
>> +                            (obj->efile.bss_shndx >= 0), i, err = 0;
> 
> This looks like a good candidate for separate static function? It can
> also be reused below to check if there is any global map present.

Sounds good.

>> +
>> +       obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0]));
>> +       if (!obj->maps_global) {
> 
> If nr_maps_global is 0, calloc might or might not return NULL, so this
> check might erroneously return error.

Good point, just read it up as well from man page, will fix.

>> +               pr_warning("alloc maps for object failed\n");
>> +               return -ENOMEM;
>> +       }
>> +
>> +       obj->nr_maps_global = nr_maps_global;
>> +       for (i = 0; i < obj->nr_maps_global; i++)
>> +               obj->maps[i].fd = -1;
>> +       i = 0;
>> +       if (obj->efile.bss_shndx >= 0)
>> +               err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss",
>> +                                             obj->efile.global_bss);
>> +       if (obj->efile.data_shndx >= 0 && !err)
>> +               err = bpf_object__init_global(obj, i++, RELO_DATA, ".data",
>> +                                             obj->efile.global_data);
>> +       if (obj->efile.rodata_shndx >= 0 && !err)
>> +               err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata",
>> +                                             obj->efile.global_rodata);
> 
> Here we know exactly what type of map we are creating, so we can just
> directly pass all the required structs/flags/data.
> 
> Also, to speed up and simplify relocation processing below, I think
> it's better to store map indexes for each of available .bss, .data and
> .rodata maps, eliminating another need for having three different
> types of data relocations.

Yep, I'll clean this up.

>> +       return err;
>> +}
>> +
>>  static bool section_have_execinstr(struct bpf_object *obj, int idx)
>>  {
>>         Elf_Scn *scn;
>> @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>>                                         pr_warning("failed to alloc program %s (%s): %s",
>>                                                    name, obj->path, cp);
>>                                 }
>> +                       } else if (strcmp(name, ".data") == 0) {
>> +                               obj->efile.global_data = data;
>> +                               obj->efile.data_shndx = idx;
>> +                       } else if (strcmp(name, ".rodata") == 0) {
>> +                               obj->efile.global_rodata = data;
>> +                               obj->efile.rodata_shndx = idx;
>>                         }
> 
> Previously if we encountered unknown PROGBITS section, we'd emit debug
> message about skipping section, should we add that message here?

Sounds reasonable, I'll add a similar 'skip section' debug output there.

>>                 } else if (sh.sh_type == SHT_REL) {
>>                         void *reloc = obj->efile.reloc;
>> @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>>                                 obj->efile.reloc[n].shdr = sh;
>>                                 obj->efile.reloc[n].data = data;
>>                         }
>> +               } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
>> +                       obj->efile.global_bss = data;
>> +                       obj->efile.bss_shndx = idx;
>>                 } else {
>>                         pr_debug("skip section(%d) %s\n", idx, name);
>>                 }
>> @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>>                 if (err)
>>                         goto out;
>>         }
>> +       if (obj->efile.data_shndx >= 0 ||
>> +           obj->efile.rodata_shndx >= 0 ||
>> +           obj->efile.bss_shndx >= 0) {
>> +               err = bpf_object__init_global_maps(obj);
>> +               if (err)
>> +                       goto out;
>> +       }
>> +
>>         err = bpf_object__init_prog_names(obj);
>>  out:
>>         return err;
>> @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>         Elf_Data *symbols = obj->efile.symbols;
>>         int text_shndx = obj->efile.text_shndx;
>>         int maps_shndx = obj->efile.maps_shndx;
>> +       int data_shndx = obj->efile.data_shndx;
>> +       int rodata_shndx = obj->efile.rodata_shndx;
>> +       int bss_shndx = obj->efile.bss_shndx;
>> +       struct bpf_map *maps_global = obj->maps_global;
>> +       size_t nr_maps_global = obj->nr_maps_global;
>>         struct bpf_map *maps = obj->maps;
>>         size_t nr_maps = obj->nr_maps;
>>         int i, nrels;
>> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>                          (long long) (rel.r_info >> 32),
>>                          (long long) sym.st_value, sym.st_name);
>>
>> -               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
>> -                       pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
>> +               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
>> +                   sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
>> +                   sym.st_shndx != bss_shndx) {
>> +                       pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>>                                    prog->section_name, sym.st_shndx);
>>                         return -LIBBPF_ERRNO__RELOC;
>>                 }
>> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>                         prog->reloc_desc[i].type = RELO_LD64;
>>                         prog->reloc_desc[i].insn_idx = insn_idx;
>>                         prog->reloc_desc[i].map_idx = map_idx;
>> +               } else if (sym.st_shndx == data_shndx ||
>> +                          sym.st_shndx == rodata_shndx ||
>> +                          sym.st_shndx == bss_shndx) {
>> +                       int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
>> +                                  (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
>> +                                                                   RELO_BSS;
>> +
>> +                       for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
>> +                               if (maps_global[map_idx].global_type == type) {
>> +                                       pr_debug("relocation: find map %zd (%s) for insn %u\n",
>> +                                                map_idx, maps_global[map_idx].name, insn_idx);
>> +                                       break;
>> +                               }
>> +                       }
>> +
>> +                       if (map_idx >= nr_maps_global) {
>> +                               pr_warning("bpf relocation: map_idx %d large than %d\n",
>> +                                          (int)map_idx, (int)nr_maps_global - 1);
>> +                               return -LIBBPF_ERRNO__RELOC;
>> +                       }
> 
> We don't need to handle all of this if we just remember global map
> indicies during creation, instead of calculating type, we can just
> pick correct index (and check it exists). And type can be just generic
> RELO_DATA.
> 
>> +
>> +                       prog->reloc_desc[i].type = type;
>> +                       prog->reloc_desc[i].insn_idx = insn_idx;
>> +                       prog->reloc_desc[i].map_idx = map_idx;
>>                 }
>>         }
>>         return 0;
>> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>>  }
>>
>>  static int
>> -bpf_object__create_maps(struct bpf_object *obj)
>> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map)
>>  {
>>         struct bpf_create_map_attr create_attr = {};
>> +       struct bpf_map_def *def = &map->def;
>> +       char *cp, errmsg[STRERR_BUFSIZE];
>> +       int fd;
>> +
>> +       if (obj->caps.name)
>> +               create_attr.name = map->name;
>> +       create_attr.map_ifindex = map->map_ifindex;
>> +       create_attr.map_type = def->type;
>> +       create_attr.map_flags = def->map_flags;
>> +       create_attr.key_size = def->key_size;
>> +       create_attr.value_size = def->value_size;
>> +       create_attr.max_entries = def->max_entries;
>> +       create_attr.btf_fd = 0;
>> +       create_attr.btf_key_type_id = 0;
>> +       create_attr.btf_value_type_id = 0;
>> +       if (bpf_map_type__is_map_in_map(def->type) &&
>> +           map->inner_map_fd >= 0)
>> +               create_attr.inner_map_fd = map->inner_map_fd;
>> +       if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
>> +               create_attr.btf_fd = btf__fd(obj->btf);
>> +               create_attr.btf_key_type_id = map->btf_key_type_id;
>> +               create_attr.btf_value_type_id = map->btf_value_type_id;
>> +       }
>> +
>> +       fd = bpf_create_map_xattr(&create_attr);
>> +       if (fd < 0 && create_attr.btf_key_type_id) {
>> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> +               pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
>> +                          map->name, cp, errno);
>> +
>> +               create_attr.btf_fd = 0;
>> +               create_attr.btf_key_type_id = 0;
>> +               create_attr.btf_value_type_id = 0;
>> +               map->btf_key_type_id = 0;
>> +               map->btf_value_type_id = 0;
>> +               fd = bpf_create_map_xattr(&create_attr);
>> +       }
>> +
>> +       return fd;
>> +}
>> +
>> +static int
>> +bpf_object__create_maps(struct bpf_object *obj)
>> +{
>>         unsigned int i;
>>         int err;
>>
>>         for (i = 0; i < obj->nr_maps; i++) {
>>                 struct bpf_map *map = &obj->maps[i];
>> -               struct bpf_map_def *def = &map->def;
>>                 char *cp, errmsg[STRERR_BUFSIZE];
>>                 int *pfd = &map->fd;
>>
>> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>>                                  map->name, map->fd);
>>                         continue;
>>                 }
>> -
>> -               if (obj->caps.name)
>> -                       create_attr.name = map->name;
>> -               create_attr.map_ifindex = map->map_ifindex;
>> -               create_attr.map_type = def->type;
>> -               create_attr.map_flags = def->map_flags;
>> -               create_attr.key_size = def->key_size;
>> -               create_attr.value_size = def->value_size;
>> -               create_attr.max_entries = def->max_entries;
>> -               create_attr.btf_fd = 0;
>> -               create_attr.btf_key_type_id = 0;
>> -               create_attr.btf_value_type_id = 0;
>> -               if (bpf_map_type__is_map_in_map(def->type) &&
>> -                   map->inner_map_fd >= 0)
>> -                       create_attr.inner_map_fd = map->inner_map_fd;
>> -
>> -               if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
>> -                       create_attr.btf_fd = btf__fd(obj->btf);
>> -                       create_attr.btf_key_type_id = map->btf_key_type_id;
>> -                       create_attr.btf_value_type_id = map->btf_value_type_id;
>> -               }
>> -
>> -               *pfd = bpf_create_map_xattr(&create_attr);
>> -               if (*pfd < 0 && create_attr.btf_key_type_id) {
>> -                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> -                       pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
>> -                                  map->name, cp, errno);
>> -                       create_attr.btf_fd = 0;
>> -                       create_attr.btf_key_type_id = 0;
>> -                       create_attr.btf_value_type_id = 0;
>> -                       map->btf_key_type_id = 0;
>> -                       map->btf_value_type_id = 0;
>> -                       *pfd = bpf_create_map_xattr(&create_attr);
>> -               }
>> -
>> +               *pfd = bpf_object__create_map(obj, map);
>>                 if (*pfd < 0) {
>>                         size_t j;
>>
>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>>                                                       &prog->reloc_desc[i]);
>>                         if (err)
>>                                 return err;
>> +               } else if (prog->reloc_desc[i].type == RELO_DATA ||
>> +                          prog->reloc_desc[i].type == RELO_RODATA ||
>> +                          prog->reloc_desc[i].type == RELO_BSS) {
>> +                       struct bpf_insn *insns = prog->insns;
>> +                       int insn_idx, map_idx, data_off;
>> +
>> +                       insn_idx = prog->reloc_desc[i].insn_idx;
>> +                       map_idx  = prog->reloc_desc[i].map_idx;
>> +                       data_off = insns[insn_idx].imm;
>> +
>> +                       if (insn_idx + 1 >= (int)prog->insns_cnt) {
>> +                               pr_warning("relocation out of range: '%s'\n",
>> +                                          prog->section_name);
>> +                               return -LIBBPF_ERRNO__RELOC;
>> +                       }
>> +                       insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
>> +                       insns[insn_idx].imm = obj->maps_global[map_idx].fd;
>> +                       insns[insn_idx + 1].imm = data_off;
>>                 }
>>         }
>>
>> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>>
>>         CHECK_ERR(bpf_object__elf_init(obj), err, out);
>>         CHECK_ERR(bpf_object__check_endianness(obj), err, out);
>> +       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>         CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>>         CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>>         CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
>> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>>
>>         for (i = 0; i < obj->nr_maps; i++)
>>                 zclose(obj->maps[i].fd);
>> -
>> +       for (i = 0; i < obj->nr_maps_global; i++)
>> +               zclose(obj->maps_global[i].fd);
>>         for (i = 0; i < obj->nr_programs; i++)
>>                 bpf_program__unload(&obj->programs[i]);
>>
>> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>>
>>         obj->loaded = true;
>>
>> -       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>         CHECK_ERR(bpf_object__create_maps(obj), err, out);
>>         CHECK_ERR(bpf_object__relocate(obj), err, out);
>>         CHECK_ERR(bpf_object__load_progs(obj), err, out);
>> --
>> 2.17.1
>>
> 
> I'm sorry if I seem a bit too obsessed with those three new relocation
> types. I just believe that having one generic and storing global maps
> along with other maps is cleaner and more uniform.

No worries, thanks for all your feedback and review!

Thanks,
Daniel