Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 1, 2019 at 10:58 AM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
> > On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@xxxxxx> wrote:
> >>
> >>
> >>
> >> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> >>> This work adds BPF loader support for global data sections
> >>> to libbpf. This allows to write BPF programs in more natural
> >>> C-like way by being able to define global variables and const
> >>> data.
> >>>
> >>> Back at LPC 2018 [0] we presented a first prototype which
> >>> implemented support for global data sections by extending BPF
> >>> syscall where union bpf_attr would get additional memory/size
> >>> pair for each section passed during prog load in order to later
> >>> add this base address into the ldimm64 instruction along with
> >>> the user provided offset when accessing a variable. Consensus
> >>> from LPC was that for proper upstream support, it would be
> >>> more desirable to use maps instead of bpf_attr extension as
> >>> this would allow for introspection of these sections as well
> >>> as potential life updates of their content. This work follows
> >>> this path by taking the following steps from loader side:
> >>>
> >>>    1) In bpf_object__elf_collect() step we pick up ".data",
> >>>       ".rodata", and ".bss" section information.
> >>>
> >>>    2) If present, in bpf_object__init_global_maps() we create
> >>>       a map that corresponds to each of the present sections.
> >>>       Given section size and access properties can differ, a
> >>>       single entry array map is created with value size that
> >>>       is corresponding to the ELF section size of .data, .bss
> >>>       or .rodata. In the latter case, the map is created as
> >>>       read-only from program side such that verifier rejects
> >>>       any write attempts into .rodata. In a subsequent step,
> >>>       for .data and .rodata sections, the section content is
> >>>       copied into the map through bpf_map_update_elem(). For
> >>>       .bss this is not necessary since array map is already
> >>>       zero-initialized by default.
> >>>
> >>>    3) In bpf_program__collect_reloc() step, we record the
> >>>       corresponding map, insn index, and relocation type for
> >>>       the global data.
> >>>
> >>>    4) And last but not least in the actual relocation step in
> >>>       bpf_program__relocate(), we mark the ldimm64 instruction
> >>>       with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
> >>>       imm field the map's file descriptor is stored as similarly
> >>>       done as in BPF_PSEUDO_MAP_FD, and in the second imm field
> >>>       (as ldimm64 is 2-insn wide) we store the access offset
> >>>       into the section.
> >>>
> >>>    5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
> >>>       load will then store the actual target address in order
> >>>       to have a 'map-lookup'-free access. That is, the actual
> >>>       map value base address + offset. The destination register
> >>>       in the verifier will then be marked as PTR_TO_MAP_VALUE,
> >>>       containing the fixed offset as reg->off and backing BPF
> >>>       map as reg->map_ptr. Meaning, it's treated as any other
> >>>       normal map value from verification side, only with
> >>>       efficient, direct value access instead of actual call to
> >>>       map lookup helper as in the typical case.
> >>>
> >>> Simple example dump of program using globals vars in each
> >>> section:
> >>>
> >>>     # readelf -a test_global_data.o
> >>>     [...]
> >>>     [ 6] .bss              NOBITS           0000000000000000  00000328
> >>>          0000000000000010  0000000000000000  WA       0     0     8
> >>>     [ 7] .data             PROGBITS         0000000000000000  00000328
> >>>          0000000000000010  0000000000000000  WA       0     0     8
> >>>     [ 8] .rodata           PROGBITS         0000000000000000  00000338
> >>>          0000000000000018  0000000000000000   A       0     0     8
> >>>     [...]
> >>>       95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
> >>>       96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
> >>>       97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
> >>>       98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
> >>>       99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
> >>>      100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
> >>>      101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
> >>>     [...]
> >>>
> >>>     # bpftool prog
> >>>     103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
> >>>          loaded_at 2019-02-28T02:02:35+0000  uid 0
> >>>          xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
> >>>     # bpftool map show id 63
> >>>     63: array  name .bss  flags 0x0                      <-- .bss area, rw
> >>>         key 4B  value 16B  max_entries 1  memlock 4096B
> >>>     # bpftool map show id 64
> >>>     64: array  name .data  flags 0x0                     <-- .data area, rw
> >>>         key 4B  value 16B  max_entries 1  memlock 4096B
> >>>     # bpftool map show id 65
> >>>     65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
> >>>         key 4B  value 24B  max_entries 1  memlock 4096B
> >>>
> >>>     # bpftool prog dump xlated id 103
> >>>     int load_static_data(struct __sk_buff * skb):
> >>>     ; int load_static_data(struct __sk_buff *skb)
> >>>        0: (b7) r1 = 0
> >>>     ; key = 0;
> >>>        1: (63) *(u32 *)(r10 -4) = r1
> >>>        2: (bf) r6 = r10
> >>>     ; int load_static_data(struct __sk_buff *skb)
> >>>        3: (07) r6 += -4
> >>>     ; bpf_map_update_elem(&result, &key, &static_bss, 0);
> >>>        4: (18) r1 = map[id:66]
> >>>        6: (bf) r2 = r6
> >>>        7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
> >>>        9: (b7) r4 = 0
> >>>       10: (85) call array_map_update_elem#99888
> >>>       11: (b7) r1 = 1
> >>>     ; key = 1;
> >>>       12: (63) *(u32 *)(r10 -4) = r1
> >>>     ; bpf_map_update_elem(&result, &key, &static_data, 0);
> >>>       13: (18) r1 = map[id:66]
> >>>       15: (bf) r2 = r6
> >>>       16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
> >>>       18: (b7) r4 = 0
> >>>       19: (85) call array_map_update_elem#99888
> >>>       20: (b7) r1 = 2
> >>>     ; key = 2;
> >>>       21: (63) *(u32 *)(r10 -4) = r1
> >>>     ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
> >>>       22: (18) r1 = map[id:66]
> >>>       24: (bf) r2 = r6
> >>>       25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
> >>>       27: (b7) r4 = 0
> >>>       28: (85) call array_map_update_elem#99888
> >>>       29: (b7) r1 = 3
> >>>     ; key = 3;
> >>>       30: (63) *(u32 *)(r10 -4) = r1
> >>>     ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
> >>>       31: (18) r7 = map[id:63][0]+8         <--.
> >>>       33: (18) r1 = map[id:66]                 |
> >>>       35: (bf) r2 = r6                         |
> >>>       36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
> >>>       38: (b7) r4 = 0
> >>>       39: (85) call array_map_update_elem#99888
> >>>     [...]
> >>>
> >>> For now .data/.rodata/.bss maps are not exposed via API to the
> >>> user, but this could be done in a subsequent step.
> >>>
> >>> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> >>> fail for static variables").
> >>>
> >>> Joint work with Joe Stringer.
> >>>
> >>>     [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
> >>>         http://vger.kernel.org/lpc-bpf2018.html#session-3
> >>>
> >>> Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> >>> Signed-off-by: Joe Stringer <joe@xxxxxxxxxxx>
> >>> ---
> >>>    tools/include/uapi/linux/bpf.h |  10 +-
> >>>    tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
> >>>    2 files changed, 226 insertions(+), 43 deletions(-)
> >>>
> >>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >>> index 8884072e1a46..04b26f59b413 100644
> >>> --- a/tools/include/uapi/linux/bpf.h
> >>> +++ b/tools/include/uapi/linux/bpf.h
> >>> @@ -287,7 +287,7 @@ enum bpf_attach_type {
> >>> [...]
> >>> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>>                         (long long) (rel.r_info >> 32),
> >>>                         (long long) sym.st_value, sym.st_name);
> >>>
> >>> -             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> >>> -                     pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> >>> +             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> >>> +                 sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> >>> +                 sym.st_shndx != bss_shndx) {
> >>> +                     pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> >>>                                   prog->section_name, sym.st_shndx);
> >>>                        return -LIBBPF_ERRNO__RELOC;
> >>>                }
> >>> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>>                        prog->reloc_desc[i].type = RELO_LD64;
> >>>                        prog->reloc_desc[i].insn_idx = insn_idx;
> >>>                        prog->reloc_desc[i].map_idx = map_idx;
> >>> +             } else if (sym.st_shndx == data_shndx ||
> >>> +                        sym.st_shndx == rodata_shndx ||
> >>> +                        sym.st_shndx == bss_shndx) {
> >>> +                     int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> >>> +                                (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> >>> +                                                                 RELO_BSS;
> >>> +
> >>> +                     for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> >>> +                             if (maps_global[map_idx].global_type == type) {
> >>> +                                     pr_debug("relocation: find map %zd (%s) for insn %u\n",
> >>> +                                              map_idx, maps_global[map_idx].name, insn_idx);
> >>> +                                     break;
> >>> +                             }
> >>> +                     }
> >>> +
> >>> +                     if (map_idx >= nr_maps_global) {
> >>> +                             pr_warning("bpf relocation: map_idx %d large than %d\n",
> >>> +                                        (int)map_idx, (int)nr_maps_global - 1);
> >>> +                             return -LIBBPF_ERRNO__RELOC;
> >>> +                     }
> >>> +
> >>> +                     prog->reloc_desc[i].type = type;
> >>> +                     prog->reloc_desc[i].insn_idx = insn_idx;
> >>> +                     prog->reloc_desc[i].map_idx = map_idx;
> >>>                }
> >>>        }
> >>>        return 0;
> >>> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
> >>>    }
> >>>
> >>>    static int
> >> [...]
> >>> +
> >>> +static int
> >>> +bpf_object__create_maps(struct bpf_object *obj)
> >>> +{
> >>>        unsigned int i;
> >>>        int err;
> >>>
> >>>        for (i = 0; i < obj->nr_maps; i++) {
> >>>                struct bpf_map *map = &obj->maps[i];
> >>> -             struct bpf_map_def *def = &map->def;
> >>>                char *cp, errmsg[STRERR_BUFSIZE];
> >>>                int *pfd = &map->fd;
> >>>
> >>> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> >>>                                 map->name, map->fd);
> >>>                        continue;
> >>>                }
> >>> -
> >>> -             if (obj->caps.name)
> >>> -                     create_attr.name = map->name;
> >>> -             create_attr.map_ifindex = map->map_ifindex;
> >>> -             create_attr.map_type = def->type;
> >>> -             create_attr.map_flags = def->map_flags;
> >>> -             create_attr.key_size = def->key_size;
> >>> -             create_attr.value_size = def->value_size;
> >>> -             create_attr.max_entries = def->max_entries;
> >>> -             create_attr.btf_fd = 0;
> >>> -             create_attr.btf_key_type_id = 0;
> >>> -             create_attr.btf_value_type_id = 0;
> >>> -             if (bpf_map_type__is_map_in_map(def->type) &&
> >>> -                 map->inner_map_fd >= 0)
> >>> -                     create_attr.inner_map_fd = map->inner_map_fd;
> >>> -
> >>> -             if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> >>> -                     create_attr.btf_fd = btf__fd(obj->btf);
> >>> -                     create_attr.btf_key_type_id = map->btf_key_type_id;
> >>> -                     create_attr.btf_value_type_id = map->btf_value_type_id;
> >>> -             }
> >>> -
> >>> -             *pfd = bpf_create_map_xattr(&create_attr);
> >>> -             if (*pfd < 0 && create_attr.btf_key_type_id) {
> >>> -                     cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >>> -                     pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> >>> -                                map->name, cp, errno);
> >>> -                     create_attr.btf_fd = 0;
> >>> -                     create_attr.btf_key_type_id = 0;
> >>> -                     create_attr.btf_value_type_id = 0;
> >>> -                     map->btf_key_type_id = 0;
> >>> -                     map->btf_value_type_id = 0;
> >>> -                     *pfd = bpf_create_map_xattr(&create_attr);
> >>> -             }
> >>> -
> >>> +             *pfd = bpf_object__create_map(obj, map);
> >>>                if (*pfd < 0) {
> >>>                        size_t j;
> >>>
> >>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> >>>                                                      &prog->reloc_desc[i]);
> >>>                        if (err)
> >>>                                return err;
> >>> +             } else if (prog->reloc_desc[i].type == RELO_DATA ||
> >>> +                        prog->reloc_desc[i].type == RELO_RODATA ||
> >>> +                        prog->reloc_desc[i].type == RELO_BSS) {
> >>> +                     struct bpf_insn *insns = prog->insns;
> >>> +                     int insn_idx, map_idx, data_off;
> >>> +
> >>> +                     insn_idx = prog->reloc_desc[i].insn_idx;
> >>> +                     map_idx  = prog->reloc_desc[i].map_idx;
> >>> +                     data_off = insns[insn_idx].imm;
> >>
> >> I want to point to a subtle difference here between handling pure global
> >> variables and static global variables. The "imm" value is only available
> >> for static variables. For example,
> >>
> >> -bash-4.4$ cat g.c
> >> static volatile long sg = 2;
> >> static volatile int si = 3;
> >> long g = 4;
> >> int i = 5;
> >> int test() { return sg + si + g + i; }
> >> -bash-4.4$
> >> -bash-4.4$ clang -target bpf -O2 -c g.c
> >>
> >> -bash-4.4$ readelf -s g.o
> >>
> >>
> >> Symbol table '.symtab' contains 8 entries:
> >>      Num:    Value          Size Type    Bind   Vis      Ndx Name
> >>        0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
> >>        1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
> >>        2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
> >>        3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
> >>        4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
> >>        5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
> >>        6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
> >>        7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
> >> -bash-4.4$
> >> -bash-4.4$ llvm-readelf -r g.o
> >>
> >> Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
> >>       Offset             Info             Type               Symbol's
> >> Value  Symbol's Name
> >> 0000000000000000  0000000400000001 R_BPF_64_64
> >> 0000000000000000 .data
> >> 0000000000000018  0000000400000001 R_BPF_64_64
> >> 0000000000000000 .data
> >> 0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
> >> 0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
> >> -bash-4.4$ llvm-objdump -d g.o
> >>
> >> g.o:    file format ELF64-BPF
> >>
> >> Disassembly of section .text:
> >> 0000000000000000 test:
> >>          0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00
> >> r1 = 16 ll
> >>          2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
> >>          3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 24 ll
> >>          5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
> >>          6:       0f 21 00 00 00 00 00 00         r1 += r2
> >>          7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 0 ll
> >>          9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
> >>         10:       0f 21 00 00 00 00 00 00         r1 += r2
> >>         11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 0 ll
> >>         13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
> >>         14:       0f 10 00 00 00 00 00 00         r0 += r1
> >>         15:       95 00 00 00 00 00 00 00         exit
> >> -bash-4.4$
> >>
> >> You can see the above, the non-static global access does not have its
> >> in-section offset encoded in the insn itself. The difference is due to
> >> llvm treating static global and non-static global differently.
> >>
> >> To support both cases, during relocation recording stage, you can
> >> also record:
> >>      . symbol binding (GELF_ST_BIND(sym.st_info)),
> >>        non-static global has binding STB_GLOBAL and static
> >>        global has binding STB_LOCAL
> >>      . symbol value (sym.st_value)
> >>
> >> During the above relocation resolution, if symbol bind is local, do
> >> what you already did here. If symbol bind is global, assign data_off
> >> with symbol value.
> >>
> >> This applied to both .data and .rodata sections.
> >>
> >> The non initialized
> >> global variable will not be in any allocated section in ELF file,
> >> it is in a COM section which is to be allocated by loader.
> >> So user defines some like
> >>      int g;
> >> and later on uses it. Right now, it will not work. The workaround
> >> is "int g = 4", or "static int g". I guess it should be
> >> okay, we should encourage users to use "static" variables instead.
> >
> > Would it be reasonable to just plain disable usage of uninitialized
> > global variables, as it kind of goes against BPF's philosophy that
> > everything should be written to, before can be read? So while we can
> > just implicitly zero-out everything beforehand, it might be a good
> > idea to remind and enforce that explictly?
>
> There will be a verifier error, so the program with "int g" will not
> run, the same as today.

Yeah, I understand, but with pretty obscure error about not supporting
relocations and stuff, right?

>
> We could improve by flagging the error at compiler error or libbpf time.

So that's my point, that having compiler emit nicer error for
target=bpf would be nice touch to user experience :)

> But it is not required. I am mentioning just for completeness.
>
> >
> >>
> >>> +
> >>> +                     if (insn_idx + 1 >= (int)prog->insns_cnt) {
> >>> +                             pr_warning("relocation out of range: '%s'\n",
> >>> +                                        prog->section_name);
> >>> +                             return -LIBBPF_ERRNO__RELOC;
> >>> +                     }
> >>> +                     insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> >>> +                     insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> >>> +                     insns[insn_idx + 1].imm = data_off;
> >>>                }
> >>>        }
> >>>
> >>> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
> >>>
> >>>        CHECK_ERR(bpf_object__elf_init(obj), err, out);
> >>>        CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> >>> +     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>>        CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
> >>>        CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
> >>>        CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> >>> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
> >>>
> >>>        for (i = 0; i < obj->nr_maps; i++)
> >>>                zclose(obj->maps[i].fd);
> >>> -
> >>> +     for (i = 0; i < obj->nr_maps_global; i++)
> >>> +             zclose(obj->maps_global[i].fd);
> >>>        for (i = 0; i < obj->nr_programs; i++)
> >>>                bpf_program__unload(&obj->programs[i]);
> >>>
> >>> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
> >>>
> >>>        obj->loaded = true;
> >>>
> >>> -     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>>        CHECK_ERR(bpf_object__create_maps(obj), err, out);
> >>>        CHECK_ERR(bpf_object__relocate(obj), err, out);
> >>>        CHECK_ERR(bpf_object__load_progs(obj), err, out);
> >>>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux