On 03/01, Daniel Borkmann wrote: > This work adds BPF loader support for global data sections > to libbpf. This allows to write BPF programs in more natural > C-like way by being able to define global variables and const > data. > > Back at LPC 2018 [0] we presented a first prototype which > implemented support for global data sections by extending BPF > syscall where union bpf_attr would get additional memory/size > pair for each section passed during prog load in order to later > add this base address into the ldimm64 instruction along with > the user provided offset when accessing a variable. Consensus > from LPC was that for proper upstream support, it would be > more desirable to use maps instead of bpf_attr extension as > this would allow for introspection of these sections as well > as potential life updates of their content. This work follows > this path by taking the following steps from loader side: > > 1) In bpf_object__elf_collect() step we pick up ".data", > ".rodata", and ".bss" section information. > > 2) If present, in bpf_object__init_global_maps() we create > a map that corresponds to each of the present sections. > Given section size and access properties can differ, a > single entry array map is created with value size that > is corresponding to the ELF section size of .data, .bss > or .rodata. In the latter case, the map is created as > read-only from program side such that verifier rejects > any write attempts into .rodata. In a subsequent step, > for .data and .rodata sections, the section content is > copied into the map through bpf_map_update_elem(). For > .bss this is not necessary since array map is already > zero-initialized by default. > > 3) In bpf_program__collect_reloc() step, we record the > corresponding map, insn index, and relocation type for > the global data. > > 4) And last but not least in the actual relocation step in > bpf_program__relocate(), we mark the ldimm64 instruction > with src_reg = BPF_PSEUDO_MAP_VALUE where in the first > imm field the map's file descriptor is stored as similarly > done as in BPF_PSEUDO_MAP_FD, and in the second imm field > (as ldimm64 is 2-insn wide) we store the access offset > into the section. > > 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE > load will then store the actual target address in order > to have a 'map-lookup'-free access. That is, the actual > map value base address + offset. The destination register > in the verifier will then be marked as PTR_TO_MAP_VALUE, > containing the fixed offset as reg->off and backing BPF > map as reg->map_ptr. Meaning, it's treated as any other > normal map value from verification side, only with > efficient, direct value access instead of actual call to > map lookup helper as in the typical case. > > Simple example dump of program using globals vars in each > section: > > # readelf -a test_global_data.o > [...] > [ 6] .bss NOBITS 0000000000000000 00000328 > 0000000000000010 0000000000000000 WA 0 0 8 > [ 7] .data PROGBITS 0000000000000000 00000328 > 0000000000000010 0000000000000000 WA 0 0 8 > [ 8] .rodata PROGBITS 0000000000000000 00000338 > 0000000000000018 0000000000000000 A 0 0 8 > [...] > 95: 0000000000000000 8 OBJECT LOCAL DEFAULT 6 static_bss > 96: 0000000000000008 8 OBJECT LOCAL DEFAULT 6 static_bss2 > 97: 0000000000000000 8 OBJECT LOCAL DEFAULT 7 static_data > 98: 0000000000000008 8 OBJECT LOCAL DEFAULT 7 static_data2 > 99: 0000000000000000 8 OBJECT LOCAL DEFAULT 8 static_rodata > 100: 0000000000000008 8 OBJECT LOCAL DEFAULT 8 static_rodata2 > 101: 0000000000000010 8 OBJECT LOCAL DEFAULT 8 static_rodata3 > [...] > > # bpftool prog > 103: sched_cls name load_static_dat tag 37a8b6822fc39a29 gpl > loaded_at 2019-02-28T02:02:35+0000 uid 0 > xlated 712B jited 426B memlock 4096B map_ids 63,64,65,66 > # bpftool map show id 63 > 63: array name .bss flags 0x0 <-- .bss area, rw Can we use <main prog>.bss/data/rodata names? If we load more than one prog with global data that should make it easier to find which one is which. > key 4B value 16B max_entries 1 memlock 4096B > # bpftool map show id 64 > 64: array name .data flags 0x0 <-- .data area, rw > key 4B value 16B max_entries 1 memlock 4096B > # bpftool map show id 65 > 65: array name .rodata flags 0x80 <-- .rodata area, ro > key 4B value 24B max_entries 1 memlock 4096B > > # bpftool prog dump xlated id 103 > int load_static_data(struct __sk_buff * skb): > ; int load_static_data(struct __sk_buff *skb) > 0: (b7) r1 = 0 > ; key = 0; > 1: (63) *(u32 *)(r10 -4) = r1 > 2: (bf) r6 = r10 > ; int load_static_data(struct __sk_buff *skb) > 3: (07) r6 += -4 > ; bpf_map_update_elem(&result, &key, &static_bss, 0); > 4: (18) r1 = map[id:66] > 6: (bf) r2 = r6 > 7: (18) r3 = map[id:63][0]+0 <-- direct static_bss addr in .bss area > 9: (b7) r4 = 0 > 10: (85) call array_map_update_elem#99888 > 11: (b7) r1 = 1 > ; key = 1; > 12: (63) *(u32 *)(r10 -4) = r1 > ; bpf_map_update_elem(&result, &key, &static_data, 0); > 13: (18) r1 = map[id:66] > 15: (bf) r2 = r6 > 16: (18) r3 = map[id:64][0]+0 <-- direct static_data addr in .data area > 18: (b7) r4 = 0 > 19: (85) call array_map_update_elem#99888 > 20: (b7) r1 = 2 > ; key = 2; > 21: (63) *(u32 *)(r10 -4) = r1 > ; bpf_map_update_elem(&result, &key, &static_rodata, 0); > 22: (18) r1 = map[id:66] > 24: (bf) r2 = r6 > 25: (18) r3 = map[id:65][0]+0 <-- direct static_rodata addr in .rodata area > 27: (b7) r4 = 0 > 28: (85) call array_map_update_elem#99888 > 29: (b7) r1 = 3 > ; key = 3; > 30: (63) *(u32 *)(r10 -4) = r1 > ; bpf_map_update_elem(&result, &key, &static_bss2, 0); > 31: (18) r7 = map[id:63][0]+8 <--. > 33: (18) r1 = map[id:66] | > 35: (bf) r2 = r6 | > 36: (18) r3 = map[id:63][0]+8 <-- direct static_bss2 addr in .bss area > 38: (b7) r4 = 0 > 39: (85) call array_map_update_elem#99888 > [...] > > For now .data/.rodata/.bss maps are not exposed via API to the > user, but this could be done in a subsequent step. > > Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't > fail for static variables"). > > Joint work with Joe Stringer. > > [0] LPC 2018, BPF track, "ELF relocation for static data in BPF", > http://vger.kernel.org/lpc-bpf2018.html#session-3 > > Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Signed-off-by: Joe Stringer <joe@xxxxxxxxxxx> > --- > tools/include/uapi/linux/bpf.h | 10 +- > tools/lib/bpf/libbpf.c | 259 +++++++++++++++++++++++++++------ > 2 files changed, 226 insertions(+), 43 deletions(-) > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 8884072e1a46..04b26f59b413 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -287,7 +287,7 @@ enum bpf_attach_type { > > #define BPF_OBJ_NAME_LEN 16U > > -/* Flags for accessing BPF object */ > +/* Flags for accessing BPF object from syscall side. */ > #define BPF_F_RDONLY (1U << 3) > #define BPF_F_WRONLY (1U << 4) > > @@ -297,6 +297,14 @@ enum bpf_attach_type { > /* Zero-initialize hash function seed. This should only be used for testing. */ > #define BPF_F_ZERO_SEED (1U << 6) > > +/* Flags for accessing BPF object from program side. */ > +#define BPF_F_RDONLY_PROG (1U << 7) > +#define BPF_F_WRONLY_PROG (1U << 8) > +#define BPF_F_ACCESS_MASK (BPF_F_RDONLY | \ > + BPF_F_RDONLY_PROG | \ > + BPF_F_WRONLY | \ > + BPF_F_WRONLY_PROG) > + > /* flags for BPF_PROG_QUERY */ > #define BPF_F_QUERY_EFFECTIVE (1U << 0) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 8f8f688f3e9b..969bc3d9f02c 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -139,6 +139,9 @@ struct bpf_program { > enum { > RELO_LD64, > RELO_CALL, > + RELO_DATA, > + RELO_RODATA, > + RELO_BSS, > } type; > int insn_idx; > union { > @@ -174,7 +177,10 @@ struct bpf_program { > struct bpf_map { > int fd; > char *name; > - size_t offset; > + union { > + __u32 global_type; > + size_t offset; > + }; > int map_ifindex; > int inner_map_fd; > struct bpf_map_def def; > @@ -194,6 +200,8 @@ struct bpf_object { > size_t nr_programs; > struct bpf_map *maps; > size_t nr_maps; > + struct bpf_map *maps_global; > + size_t nr_maps_global; > > bool loaded; > bool has_pseudo_calls; > @@ -209,6 +217,9 @@ struct bpf_object { > Elf *elf; > GElf_Ehdr ehdr; > Elf_Data *symbols; > + Elf_Data *global_data; > + Elf_Data *global_rodata; > + Elf_Data *global_bss; > size_t strtabidx; > struct { > GElf_Shdr shdr; > @@ -217,6 +228,9 @@ struct bpf_object { > int nr_reloc; > int maps_shndx; > int text_shndx; > + int data_shndx; > + int rodata_shndx; > + int bss_shndx; > } efile; > /* > * All loaded bpf_object is linked in a list, which is > @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path, > obj->efile.obj_buf = obj_buf; > obj->efile.obj_buf_sz = obj_buf_sz; > obj->efile.maps_shndx = -1; > + obj->efile.data_shndx = -1; > + obj->efile.rodata_shndx = -1; > + obj->efile.bss_shndx = -1; > > obj->loaded = false; > > @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj) > obj->efile.elf = NULL; > } > obj->efile.symbols = NULL; > + obj->efile.global_data = NULL; > + obj->efile.global_rodata = NULL; > + obj->efile.global_bss = NULL; > > zfree(&obj->efile.reloc); > obj->efile.nr_reloc = 0; > @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags) > return 0; > } > > +static int > +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map); > + > +static int > +bpf_object__init_global(struct bpf_object *obj, int i, int type, > + const char *name, Elf_Data *map_data) > +{ > + struct bpf_map *map = &obj->maps_global[i]; > + struct bpf_map_def *def = &map->def; > + char *cp, errmsg[STRERR_BUFSIZE]; > + int err, slot0 = 0; > + > + def->type = BPF_MAP_TYPE_ARRAY; > + def->key_size = sizeof(int); > + def->value_size = map_data->d_size; > + def->max_entries = 1; > + def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0; > + > + map->name = strdup(name); > + map->global_type = type; > + map->fd = bpf_object__create_map(obj, map); > + if (map->fd < 0) { > + err = map->fd; > + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); > + pr_warning("failed to create map (name: '%s'): %s\n", > + map->name, cp); > + goto destroy; > + } > + > + pr_debug("create map %s: fd=%d\n", map->name, map->fd); > + > + if (type != RELO_BSS) { > + err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0); > + if (err < 0) { > + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); > + pr_warning("failed to update map (name: '%s'): %s\n", > + map->name, cp); > + goto destroy; > + } > + > + pr_debug("updated map %s with elf data: fd=%d\n", map->name, > + map->fd); > + } > + return 0; > +destroy: > + for (i = 0; i < obj->nr_maps_global; i++) > + zclose(obj->maps_global[i].fd); > + return err; > +} > + > +static int > +bpf_object__init_global_maps(struct bpf_object *obj) > +{ > + int nr_maps_global = (obj->efile.data_shndx >= 0) + > + (obj->efile.rodata_shndx >= 0) + > + (obj->efile.bss_shndx >= 0), i, err = 0; > + > + obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0])); > + if (!obj->maps_global) { > + pr_warning("alloc maps for object failed\n"); > + return -ENOMEM; > + } > + > + obj->nr_maps_global = nr_maps_global; > + for (i = 0; i < obj->nr_maps_global; i++) > + obj->maps[i].fd = -1; > + i = 0; > + if (obj->efile.bss_shndx >= 0) > + err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss", > + obj->efile.global_bss); > + if (obj->efile.data_shndx >= 0 && !err) > + err = bpf_object__init_global(obj, i++, RELO_DATA, ".data", > + obj->efile.global_data); > + if (obj->efile.rodata_shndx >= 0 && !err) > + err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata", > + obj->efile.global_rodata); > + return err; > +} > + > static bool section_have_execinstr(struct bpf_object *obj, int idx) > { > Elf_Scn *scn; > @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) > pr_warning("failed to alloc program %s (%s): %s", > name, obj->path, cp); > } > + } else if (strcmp(name, ".data") == 0) { > + obj->efile.global_data = data; > + obj->efile.data_shndx = idx; > + } else if (strcmp(name, ".rodata") == 0) { > + obj->efile.global_rodata = data; > + obj->efile.rodata_shndx = idx; > } > } else if (sh.sh_type == SHT_REL) { > void *reloc = obj->efile.reloc; > @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) > obj->efile.reloc[n].shdr = sh; > obj->efile.reloc[n].data = data; > } > + } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) { > + obj->efile.global_bss = data; > + obj->efile.bss_shndx = idx; > } else { > pr_debug("skip section(%d) %s\n", idx, name); > } > @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) > if (err) > goto out; > } > + if (obj->efile.data_shndx >= 0 || > + obj->efile.rodata_shndx >= 0 || > + obj->efile.bss_shndx >= 0) { > + err = bpf_object__init_global_maps(obj); > + if (err) > + goto out; > + } > + > err = bpf_object__init_prog_names(obj); > out: > return err; > @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr, > Elf_Data *symbols = obj->efile.symbols; > int text_shndx = obj->efile.text_shndx; > int maps_shndx = obj->efile.maps_shndx; > + int data_shndx = obj->efile.data_shndx; > + int rodata_shndx = obj->efile.rodata_shndx; > + int bss_shndx = obj->efile.bss_shndx; > + struct bpf_map *maps_global = obj->maps_global; > + size_t nr_maps_global = obj->nr_maps_global; > struct bpf_map *maps = obj->maps; > size_t nr_maps = obj->nr_maps; > int i, nrels; > @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr, > (long long) (rel.r_info >> 32), > (long long) sym.st_value, sym.st_name); > > - if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) { > - pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n", > + if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx && > + sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx && > + sym.st_shndx != bss_shndx) { > + pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n", > prog->section_name, sym.st_shndx); > return -LIBBPF_ERRNO__RELOC; > } > @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr, > prog->reloc_desc[i].type = RELO_LD64; > prog->reloc_desc[i].insn_idx = insn_idx; > prog->reloc_desc[i].map_idx = map_idx; > + } else if (sym.st_shndx == data_shndx || > + sym.st_shndx == rodata_shndx || > + sym.st_shndx == bss_shndx) { > + int type = (sym.st_shndx == data_shndx) ? RELO_DATA : > + (sym.st_shndx == rodata_shndx) ? RELO_RODATA : > + RELO_BSS; > + > + for (map_idx = 0; map_idx < nr_maps_global; map_idx++) { > + if (maps_global[map_idx].global_type == type) { > + pr_debug("relocation: find map %zd (%s) for insn %u\n", > + map_idx, maps_global[map_idx].name, insn_idx); > + break; > + } > + } > + > + if (map_idx >= nr_maps_global) { > + pr_warning("bpf relocation: map_idx %d large than %d\n", > + (int)map_idx, (int)nr_maps_global - 1); > + return -LIBBPF_ERRNO__RELOC; > + } > + > + prog->reloc_desc[i].type = type; > + prog->reloc_desc[i].insn_idx = insn_idx; > + prog->reloc_desc[i].map_idx = map_idx; > } > } > return 0; > @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj) > } > > static int > -bpf_object__create_maps(struct bpf_object *obj) > +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map) > { > struct bpf_create_map_attr create_attr = {}; > + struct bpf_map_def *def = &map->def; > + char *cp, errmsg[STRERR_BUFSIZE]; > + int fd; > + > + if (obj->caps.name) > + create_attr.name = map->name; > + create_attr.map_ifindex = map->map_ifindex; > + create_attr.map_type = def->type; > + create_attr.map_flags = def->map_flags; > + create_attr.key_size = def->key_size; > + create_attr.value_size = def->value_size; > + create_attr.max_entries = def->max_entries; > + create_attr.btf_fd = 0; > + create_attr.btf_key_type_id = 0; > + create_attr.btf_value_type_id = 0; > + if (bpf_map_type__is_map_in_map(def->type) && > + map->inner_map_fd >= 0) > + create_attr.inner_map_fd = map->inner_map_fd; > + if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) { > + create_attr.btf_fd = btf__fd(obj->btf); > + create_attr.btf_key_type_id = map->btf_key_type_id; > + create_attr.btf_value_type_id = map->btf_value_type_id; > + } > + > + fd = bpf_create_map_xattr(&create_attr); > + if (fd < 0 && create_attr.btf_key_type_id) { > + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); > + pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n", > + map->name, cp, errno); > + > + create_attr.btf_fd = 0; > + create_attr.btf_key_type_id = 0; > + create_attr.btf_value_type_id = 0; > + map->btf_key_type_id = 0; > + map->btf_value_type_id = 0; > + fd = bpf_create_map_xattr(&create_attr); > + } > + > + return fd; > +} > + > +static int > +bpf_object__create_maps(struct bpf_object *obj) > +{ > unsigned int i; > int err; > > for (i = 0; i < obj->nr_maps; i++) { > struct bpf_map *map = &obj->maps[i]; > - struct bpf_map_def *def = &map->def; > char *cp, errmsg[STRERR_BUFSIZE]; > int *pfd = &map->fd; > > @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj) > map->name, map->fd); > continue; > } > - > - if (obj->caps.name) > - create_attr.name = map->name; > - create_attr.map_ifindex = map->map_ifindex; > - create_attr.map_type = def->type; > - create_attr.map_flags = def->map_flags; > - create_attr.key_size = def->key_size; > - create_attr.value_size = def->value_size; > - create_attr.max_entries = def->max_entries; > - create_attr.btf_fd = 0; > - create_attr.btf_key_type_id = 0; > - create_attr.btf_value_type_id = 0; > - if (bpf_map_type__is_map_in_map(def->type) && > - map->inner_map_fd >= 0) > - create_attr.inner_map_fd = map->inner_map_fd; > - > - if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) { > - create_attr.btf_fd = btf__fd(obj->btf); > - create_attr.btf_key_type_id = map->btf_key_type_id; > - create_attr.btf_value_type_id = map->btf_value_type_id; > - } > - > - *pfd = bpf_create_map_xattr(&create_attr); > - if (*pfd < 0 && create_attr.btf_key_type_id) { > - cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); > - pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n", > - map->name, cp, errno); > - create_attr.btf_fd = 0; > - create_attr.btf_key_type_id = 0; > - create_attr.btf_value_type_id = 0; > - map->btf_key_type_id = 0; > - map->btf_value_type_id = 0; > - *pfd = bpf_create_map_xattr(&create_attr); > - } > - > + *pfd = bpf_object__create_map(obj, map); > if (*pfd < 0) { > size_t j; > > @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj) > &prog->reloc_desc[i]); > if (err) > return err; > + } else if (prog->reloc_desc[i].type == RELO_DATA || > + prog->reloc_desc[i].type == RELO_RODATA || > + prog->reloc_desc[i].type == RELO_BSS) { > + struct bpf_insn *insns = prog->insns; > + int insn_idx, map_idx, data_off; > + > + insn_idx = prog->reloc_desc[i].insn_idx; > + map_idx = prog->reloc_desc[i].map_idx; > + data_off = insns[insn_idx].imm; > + > + if (insn_idx + 1 >= (int)prog->insns_cnt) { > + pr_warning("relocation out of range: '%s'\n", > + prog->section_name); > + return -LIBBPF_ERRNO__RELOC; > + } > + insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE; > + insns[insn_idx].imm = obj->maps_global[map_idx].fd; > + insns[insn_idx + 1].imm = data_off; > } > } > > @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz, > > CHECK_ERR(bpf_object__elf_init(obj), err, out); > CHECK_ERR(bpf_object__check_endianness(obj), err, out); > + CHECK_ERR(bpf_object__probe_caps(obj), err, out); > CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out); > CHECK_ERR(bpf_object__collect_reloc(obj), err, out); > CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out); > @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj) > > for (i = 0; i < obj->nr_maps; i++) > zclose(obj->maps[i].fd); > - > + for (i = 0; i < obj->nr_maps_global; i++) > + zclose(obj->maps_global[i].fd); > for (i = 0; i < obj->nr_programs; i++) > bpf_program__unload(&obj->programs[i]); > > @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj) > > obj->loaded = true; > > - CHECK_ERR(bpf_object__probe_caps(obj), err, out); > CHECK_ERR(bpf_object__create_maps(obj), err, out); > CHECK_ERR(bpf_object__relocate(obj), err, out); > CHECK_ERR(bpf_object__load_progs(obj), err, out); > -- > 2.17.1 >