This work adds BPF loader support for global data sections to libbpf. This allows to write BPF programs in more natural C-like way by being able to define global variables and const data. Back at LPC 2018 [0] we presented a first prototype which implemented support for global data sections by extending BPF syscall where union bpf_attr would get additional memory/size pair for each section passed during prog load in order to later add this base address into the ldimm64 instruction along with the user provided offset when accessing a variable. Consensus from LPC was that for proper upstream support, it would be more desirable to use maps instead of bpf_attr extension as this would allow for introspection of these sections as well as potential life updates of their content. This work follows this path by taking the following steps from loader side: 1) In bpf_object__elf_collect() step we pick up ".data", ".rodata", and ".bss" section information. 2) If present, in bpf_object__init_global_maps() we create a map that corresponds to each of the present sections. Given section size and access properties can differ, a single entry array map is created with value size that is corresponding to the ELF section size of .data, .bss or .rodata. In the latter case, the map is created as read-only from program side such that verifier rejects any write attempts into .rodata. In a subsequent step, for .data and .rodata sections, the section content is copied into the map through bpf_map_update_elem(). For .bss this is not necessary since array map is already zero-initialized by default. 3) In bpf_program__collect_reloc() step, we record the corresponding map, insn index, and relocation type for the global data. 4) And last but not least in the actual relocation step in bpf_program__relocate(), we mark the ldimm64 instruction with src_reg = BPF_PSEUDO_MAP_VALUE where in the first imm field the map's file descriptor is stored as similarly done as in BPF_PSEUDO_MAP_FD, and in the second imm field (as ldimm64 is 2-insn wide) we store the access offset into the section. 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE load will then store the actual target address in order to have a 'map-lookup'-free access. That is, the actual map value base address + offset. The destination register in the verifier will then be marked as PTR_TO_MAP_VALUE, containing the fixed offset as reg->off and backing BPF map as reg->map_ptr. Meaning, it's treated as any other normal map value from verification side, only with efficient, direct value access instead of actual call to map lookup helper as in the typical case. Simple example dump of program using globals vars in each section: # readelf -a test_global_data.o [...] [ 6] .bss NOBITS 0000000000000000 00000328 0000000000000010 0000000000000000 WA 0 0 8 [ 7] .data PROGBITS 0000000000000000 00000328 0000000000000010 0000000000000000 WA 0 0 8 [ 8] .rodata PROGBITS 0000000000000000 00000338 0000000000000018 0000000000000000 A 0 0 8 [...] 95: 0000000000000000 8 OBJECT LOCAL DEFAULT 6 static_bss 96: 0000000000000008 8 OBJECT LOCAL DEFAULT 6 static_bss2 97: 0000000000000000 8 OBJECT LOCAL DEFAULT 7 static_data 98: 0000000000000008 8 OBJECT LOCAL DEFAULT 7 static_data2 99: 0000000000000000 8 OBJECT LOCAL DEFAULT 8 static_rodata 100: 0000000000000008 8 OBJECT LOCAL DEFAULT 8 static_rodata2 101: 0000000000000010 8 OBJECT LOCAL DEFAULT 8 static_rodata3 [...] # bpftool prog 103: sched_cls name load_static_dat tag 37a8b6822fc39a29 gpl loaded_at 2019-02-28T02:02:35+0000 uid 0 xlated 712B jited 426B memlock 4096B map_ids 63,64,65,66 # bpftool map show id 63 63: array name .bss flags 0x0 <-- .bss area, rw key 4B value 16B max_entries 1 memlock 4096B # bpftool map show id 64 64: array name .data flags 0x0 <-- .data area, rw key 4B value 16B max_entries 1 memlock 4096B # bpftool map show id 65 65: array name .rodata flags 0x80 <-- .rodata area, ro key 4B value 24B max_entries 1 memlock 4096B # bpftool prog dump xlated id 103 int load_static_data(struct __sk_buff * skb): ; int load_static_data(struct __sk_buff *skb) 0: (b7) r1 = 0 ; key = 0; 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r6 = r10 ; int load_static_data(struct __sk_buff *skb) 3: (07) r6 += -4 ; bpf_map_update_elem(&result, &key, &static_bss, 0); 4: (18) r1 = map[id:66] 6: (bf) r2 = r6 7: (18) r3 = map[id:63][0]+0 <-- direct static_bss addr in .bss area 9: (b7) r4 = 0 10: (85) call array_map_update_elem#99888 11: (b7) r1 = 1 ; key = 1; 12: (63) *(u32 *)(r10 -4) = r1 ; bpf_map_update_elem(&result, &key, &static_data, 0); 13: (18) r1 = map[id:66] 15: (bf) r2 = r6 16: (18) r3 = map[id:64][0]+0 <-- direct static_data addr in .data area 18: (b7) r4 = 0 19: (85) call array_map_update_elem#99888 20: (b7) r1 = 2 ; key = 2; 21: (63) *(u32 *)(r10 -4) = r1 ; bpf_map_update_elem(&result, &key, &static_rodata, 0); 22: (18) r1 = map[id:66] 24: (bf) r2 = r6 25: (18) r3 = map[id:65][0]+0 <-- direct static_rodata addr in .rodata area 27: (b7) r4 = 0 28: (85) call array_map_update_elem#99888 29: (b7) r1 = 3 ; key = 3; 30: (63) *(u32 *)(r10 -4) = r1 ; bpf_map_update_elem(&result, &key, &static_bss2, 0); 31: (18) r7 = map[id:63][0]+8 <--. 33: (18) r1 = map[id:66] | 35: (bf) r2 = r6 | 36: (18) r3 = map[id:63][0]+8 <-- direct static_bss2 addr in .bss area 38: (b7) r4 = 0 39: (85) call array_map_update_elem#99888 [...] For now .data/.rodata/.bss maps are not exposed via API to the user, but this could be done in a subsequent step. Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail for static variables"). Joint work with Joe Stringer. [0] LPC 2018, BPF track, "ELF relocation for static data in BPF", http://vger.kernel.org/lpc-bpf2018.html#session-3 Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> Signed-off-by: Joe Stringer <joe@xxxxxxxxxxx> --- tools/include/uapi/linux/bpf.h | 10 +- tools/lib/bpf/libbpf.c | 259 +++++++++++++++++++++++++++------ 2 files changed, 226 insertions(+), 43 deletions(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 8884072e1a46..04b26f59b413 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -287,7 +287,7 @@ enum bpf_attach_type { #define BPF_OBJ_NAME_LEN 16U -/* Flags for accessing BPF object */ +/* Flags for accessing BPF object from syscall side. */ #define BPF_F_RDONLY (1U << 3) #define BPF_F_WRONLY (1U << 4) @@ -297,6 +297,14 @@ enum bpf_attach_type { /* Zero-initialize hash function seed. This should only be used for testing. */ #define BPF_F_ZERO_SEED (1U << 6) +/* Flags for accessing BPF object from program side. */ +#define BPF_F_RDONLY_PROG (1U << 7) +#define BPF_F_WRONLY_PROG (1U << 8) +#define BPF_F_ACCESS_MASK (BPF_F_RDONLY | \ + BPF_F_RDONLY_PROG | \ + BPF_F_WRONLY | \ + BPF_F_WRONLY_PROG) + /* flags for BPF_PROG_QUERY */ #define BPF_F_QUERY_EFFECTIVE (1U << 0) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 8f8f688f3e9b..969bc3d9f02c 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -139,6 +139,9 @@ struct bpf_program { enum { RELO_LD64, RELO_CALL, + RELO_DATA, + RELO_RODATA, + RELO_BSS, } type; int insn_idx; union { @@ -174,7 +177,10 @@ struct bpf_program { struct bpf_map { int fd; char *name; - size_t offset; + union { + __u32 global_type; + size_t offset; + }; int map_ifindex; int inner_map_fd; struct bpf_map_def def; @@ -194,6 +200,8 @@ struct bpf_object { size_t nr_programs; struct bpf_map *maps; size_t nr_maps; + struct bpf_map *maps_global; + size_t nr_maps_global; bool loaded; bool has_pseudo_calls; @@ -209,6 +217,9 @@ struct bpf_object { Elf *elf; GElf_Ehdr ehdr; Elf_Data *symbols; + Elf_Data *global_data; + Elf_Data *global_rodata; + Elf_Data *global_bss; size_t strtabidx; struct { GElf_Shdr shdr; @@ -217,6 +228,9 @@ struct bpf_object { int nr_reloc; int maps_shndx; int text_shndx; + int data_shndx; + int rodata_shndx; + int bss_shndx; } efile; /* * All loaded bpf_object is linked in a list, which is @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path, obj->efile.obj_buf = obj_buf; obj->efile.obj_buf_sz = obj_buf_sz; obj->efile.maps_shndx = -1; + obj->efile.data_shndx = -1; + obj->efile.rodata_shndx = -1; + obj->efile.bss_shndx = -1; obj->loaded = false; @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj) obj->efile.elf = NULL; } obj->efile.symbols = NULL; + obj->efile.global_data = NULL; + obj->efile.global_rodata = NULL; + obj->efile.global_bss = NULL; zfree(&obj->efile.reloc); obj->efile.nr_reloc = 0; @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags) return 0; } +static int +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map); + +static int +bpf_object__init_global(struct bpf_object *obj, int i, int type, + const char *name, Elf_Data *map_data) +{ + struct bpf_map *map = &obj->maps_global[i]; + struct bpf_map_def *def = &map->def; + char *cp, errmsg[STRERR_BUFSIZE]; + int err, slot0 = 0; + + def->type = BPF_MAP_TYPE_ARRAY; + def->key_size = sizeof(int); + def->value_size = map_data->d_size; + def->max_entries = 1; + def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0; + + map->name = strdup(name); + map->global_type = type; + map->fd = bpf_object__create_map(obj, map); + if (map->fd < 0) { + err = map->fd; + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); + pr_warning("failed to create map (name: '%s'): %s\n", + map->name, cp); + goto destroy; + } + + pr_debug("create map %s: fd=%d\n", map->name, map->fd); + + if (type != RELO_BSS) { + err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0); + if (err < 0) { + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); + pr_warning("failed to update map (name: '%s'): %s\n", + map->name, cp); + goto destroy; + } + + pr_debug("updated map %s with elf data: fd=%d\n", map->name, + map->fd); + } + return 0; +destroy: + for (i = 0; i < obj->nr_maps_global; i++) + zclose(obj->maps_global[i].fd); + return err; +} + +static int +bpf_object__init_global_maps(struct bpf_object *obj) +{ + int nr_maps_global = (obj->efile.data_shndx >= 0) + + (obj->efile.rodata_shndx >= 0) + + (obj->efile.bss_shndx >= 0), i, err = 0; + + obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0])); + if (!obj->maps_global) { + pr_warning("alloc maps for object failed\n"); + return -ENOMEM; + } + + obj->nr_maps_global = nr_maps_global; + for (i = 0; i < obj->nr_maps_global; i++) + obj->maps[i].fd = -1; + i = 0; + if (obj->efile.bss_shndx >= 0) + err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss", + obj->efile.global_bss); + if (obj->efile.data_shndx >= 0 && !err) + err = bpf_object__init_global(obj, i++, RELO_DATA, ".data", + obj->efile.global_data); + if (obj->efile.rodata_shndx >= 0 && !err) + err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata", + obj->efile.global_rodata); + return err; +} + static bool section_have_execinstr(struct bpf_object *obj, int idx) { Elf_Scn *scn; @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) pr_warning("failed to alloc program %s (%s): %s", name, obj->path, cp); } + } else if (strcmp(name, ".data") == 0) { + obj->efile.global_data = data; + obj->efile.data_shndx = idx; + } else if (strcmp(name, ".rodata") == 0) { + obj->efile.global_rodata = data; + obj->efile.rodata_shndx = idx; } } else if (sh.sh_type == SHT_REL) { void *reloc = obj->efile.reloc; @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) obj->efile.reloc[n].shdr = sh; obj->efile.reloc[n].data = data; } + } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) { + obj->efile.global_bss = data; + obj->efile.bss_shndx = idx; } else { pr_debug("skip section(%d) %s\n", idx, name); } @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) if (err) goto out; } + if (obj->efile.data_shndx >= 0 || + obj->efile.rodata_shndx >= 0 || + obj->efile.bss_shndx >= 0) { + err = bpf_object__init_global_maps(obj); + if (err) + goto out; + } + err = bpf_object__init_prog_names(obj); out: return err; @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr, Elf_Data *symbols = obj->efile.symbols; int text_shndx = obj->efile.text_shndx; int maps_shndx = obj->efile.maps_shndx; + int data_shndx = obj->efile.data_shndx; + int rodata_shndx = obj->efile.rodata_shndx; + int bss_shndx = obj->efile.bss_shndx; + struct bpf_map *maps_global = obj->maps_global; + size_t nr_maps_global = obj->nr_maps_global; struct bpf_map *maps = obj->maps; size_t nr_maps = obj->nr_maps; int i, nrels; @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr, (long long) (rel.r_info >> 32), (long long) sym.st_value, sym.st_name); - if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) { - pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n", + if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx && + sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx && + sym.st_shndx != bss_shndx) { + pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n", prog->section_name, sym.st_shndx); return -LIBBPF_ERRNO__RELOC; } @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr, prog->reloc_desc[i].type = RELO_LD64; prog->reloc_desc[i].insn_idx = insn_idx; prog->reloc_desc[i].map_idx = map_idx; + } else if (sym.st_shndx == data_shndx || + sym.st_shndx == rodata_shndx || + sym.st_shndx == bss_shndx) { + int type = (sym.st_shndx == data_shndx) ? RELO_DATA : + (sym.st_shndx == rodata_shndx) ? RELO_RODATA : + RELO_BSS; + + for (map_idx = 0; map_idx < nr_maps_global; map_idx++) { + if (maps_global[map_idx].global_type == type) { + pr_debug("relocation: find map %zd (%s) for insn %u\n", + map_idx, maps_global[map_idx].name, insn_idx); + break; + } + } + + if (map_idx >= nr_maps_global) { + pr_warning("bpf relocation: map_idx %d large than %d\n", + (int)map_idx, (int)nr_maps_global - 1); + return -LIBBPF_ERRNO__RELOC; + } + + prog->reloc_desc[i].type = type; + prog->reloc_desc[i].insn_idx = insn_idx; + prog->reloc_desc[i].map_idx = map_idx; } } return 0; @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj) } static int -bpf_object__create_maps(struct bpf_object *obj) +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map) { struct bpf_create_map_attr create_attr = {}; + struct bpf_map_def *def = &map->def; + char *cp, errmsg[STRERR_BUFSIZE]; + int fd; + + if (obj->caps.name) + create_attr.name = map->name; + create_attr.map_ifindex = map->map_ifindex; + create_attr.map_type = def->type; + create_attr.map_flags = def->map_flags; + create_attr.key_size = def->key_size; + create_attr.value_size = def->value_size; + create_attr.max_entries = def->max_entries; + create_attr.btf_fd = 0; + create_attr.btf_key_type_id = 0; + create_attr.btf_value_type_id = 0; + if (bpf_map_type__is_map_in_map(def->type) && + map->inner_map_fd >= 0) + create_attr.inner_map_fd = map->inner_map_fd; + if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) { + create_attr.btf_fd = btf__fd(obj->btf); + create_attr.btf_key_type_id = map->btf_key_type_id; + create_attr.btf_value_type_id = map->btf_value_type_id; + } + + fd = bpf_create_map_xattr(&create_attr); + if (fd < 0 && create_attr.btf_key_type_id) { + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); + pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n", + map->name, cp, errno); + + create_attr.btf_fd = 0; + create_attr.btf_key_type_id = 0; + create_attr.btf_value_type_id = 0; + map->btf_key_type_id = 0; + map->btf_value_type_id = 0; + fd = bpf_create_map_xattr(&create_attr); + } + + return fd; +} + +static int +bpf_object__create_maps(struct bpf_object *obj) +{ unsigned int i; int err; for (i = 0; i < obj->nr_maps; i++) { struct bpf_map *map = &obj->maps[i]; - struct bpf_map_def *def = &map->def; char *cp, errmsg[STRERR_BUFSIZE]; int *pfd = &map->fd; @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj) map->name, map->fd); continue; } - - if (obj->caps.name) - create_attr.name = map->name; - create_attr.map_ifindex = map->map_ifindex; - create_attr.map_type = def->type; - create_attr.map_flags = def->map_flags; - create_attr.key_size = def->key_size; - create_attr.value_size = def->value_size; - create_attr.max_entries = def->max_entries; - create_attr.btf_fd = 0; - create_attr.btf_key_type_id = 0; - create_attr.btf_value_type_id = 0; - if (bpf_map_type__is_map_in_map(def->type) && - map->inner_map_fd >= 0) - create_attr.inner_map_fd = map->inner_map_fd; - - if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) { - create_attr.btf_fd = btf__fd(obj->btf); - create_attr.btf_key_type_id = map->btf_key_type_id; - create_attr.btf_value_type_id = map->btf_value_type_id; - } - - *pfd = bpf_create_map_xattr(&create_attr); - if (*pfd < 0 && create_attr.btf_key_type_id) { - cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); - pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n", - map->name, cp, errno); - create_attr.btf_fd = 0; - create_attr.btf_key_type_id = 0; - create_attr.btf_value_type_id = 0; - map->btf_key_type_id = 0; - map->btf_value_type_id = 0; - *pfd = bpf_create_map_xattr(&create_attr); - } - + *pfd = bpf_object__create_map(obj, map); if (*pfd < 0) { size_t j; @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj) &prog->reloc_desc[i]); if (err) return err; + } else if (prog->reloc_desc[i].type == RELO_DATA || + prog->reloc_desc[i].type == RELO_RODATA || + prog->reloc_desc[i].type == RELO_BSS) { + struct bpf_insn *insns = prog->insns; + int insn_idx, map_idx, data_off; + + insn_idx = prog->reloc_desc[i].insn_idx; + map_idx = prog->reloc_desc[i].map_idx; + data_off = insns[insn_idx].imm; + + if (insn_idx + 1 >= (int)prog->insns_cnt) { + pr_warning("relocation out of range: '%s'\n", + prog->section_name); + return -LIBBPF_ERRNO__RELOC; + } + insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE; + insns[insn_idx].imm = obj->maps_global[map_idx].fd; + insns[insn_idx + 1].imm = data_off; } } @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz, CHECK_ERR(bpf_object__elf_init(obj), err, out); CHECK_ERR(bpf_object__check_endianness(obj), err, out); + CHECK_ERR(bpf_object__probe_caps(obj), err, out); CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out); CHECK_ERR(bpf_object__collect_reloc(obj), err, out); CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out); @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj) for (i = 0; i < obj->nr_maps; i++) zclose(obj->maps[i].fd); - + for (i = 0; i < obj->nr_maps_global; i++) + zclose(obj->maps_global[i].fd); for (i = 0; i < obj->nr_programs; i++) bpf_program__unload(&obj->programs[i]); @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj) obj->loaded = true; - CHECK_ERR(bpf_object__probe_caps(obj), err, out); CHECK_ERR(bpf_object__create_maps(obj), err, out); CHECK_ERR(bpf_object__relocate(obj), err, out); CHECK_ERR(bpf_object__load_progs(obj), err, out); -- 2.17.1