Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> This work adds BPF loader support for global data sections
> to libbpf. This allows to write BPF programs in more natural
> C-like way by being able to define global variables and const
> data.
> 
> Back at LPC 2018 [0] we presented a first prototype which
> implemented support for global data sections by extending BPF
> syscall where union bpf_attr would get additional memory/size
> pair for each section passed during prog load in order to later
> add this base address into the ldimm64 instruction along with
> the user provided offset when accessing a variable. Consensus
> from LPC was that for proper upstream support, it would be
> more desirable to use maps instead of bpf_attr extension as
> this would allow for introspection of these sections as well
> as potential life updates of their content. This work follows
> this path by taking the following steps from loader side:
> 
>   1) In bpf_object__elf_collect() step we pick up ".data",
>      ".rodata", and ".bss" section information.
> 
>   2) If present, in bpf_object__init_global_maps() we create
>      a map that corresponds to each of the present sections.
>      Given section size and access properties can differ, a
>      single entry array map is created with value size that
>      is corresponding to the ELF section size of .data, .bss
>      or .rodata. In the latter case, the map is created as
>      read-only from program side such that verifier rejects
>      any write attempts into .rodata. In a subsequent step,
>      for .data and .rodata sections, the section content is
>      copied into the map through bpf_map_update_elem(). For
>      .bss this is not necessary since array map is already
>      zero-initialized by default.
> 
>   3) In bpf_program__collect_reloc() step, we record the
>      corresponding map, insn index, and relocation type for
>      the global data.
> 
>   4) And last but not least in the actual relocation step in
>      bpf_program__relocate(), we mark the ldimm64 instruction
>      with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>      imm field the map's file descriptor is stored as similarly
>      done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>      (as ldimm64 is 2-insn wide) we store the access offset
>      into the section.
> 
>   5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>      load will then store the actual target address in order
>      to have a 'map-lookup'-free access. That is, the actual
>      map value base address + offset. The destination register
>      in the verifier will then be marked as PTR_TO_MAP_VALUE,
>      containing the fixed offset as reg->off and backing BPF
>      map as reg->map_ptr. Meaning, it's treated as any other
>      normal map value from verification side, only with
>      efficient, direct value access instead of actual call to
>      map lookup helper as in the typical case.
> 
> Simple example dump of program using globals vars in each
> section:
> 
>    # readelf -a test_global_data.o
>    [...]
>    [ 6] .bss              NOBITS           0000000000000000  00000328
>         0000000000000010  0000000000000000  WA       0     0     8
>    [ 7] .data             PROGBITS         0000000000000000  00000328
>         0000000000000010  0000000000000000  WA       0     0     8
>    [ 8] .rodata           PROGBITS         0000000000000000  00000338
>         0000000000000018  0000000000000000   A       0     0     8
>    [...]
>      95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>      96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>      97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>      98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>      99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>     100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>     101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>    [...]
> 
>    # bpftool prog
>    103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>         loaded_at 2019-02-28T02:02:35+0000  uid 0
>         xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>    # bpftool map show id 63
>    63: array  name .bss  flags 0x0                      <-- .bss area, rw
>        key 4B  value 16B  max_entries 1  memlock 4096B
>    # bpftool map show id 64
>    64: array  name .data  flags 0x0                     <-- .data area, rw
>        key 4B  value 16B  max_entries 1  memlock 4096B
>    # bpftool map show id 65
>    65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>        key 4B  value 24B  max_entries 1  memlock 4096B
> 
>    # bpftool prog dump xlated id 103
>    int load_static_data(struct __sk_buff * skb):
>    ; int load_static_data(struct __sk_buff *skb)
>       0: (b7) r1 = 0
>    ; key = 0;
>       1: (63) *(u32 *)(r10 -4) = r1
>       2: (bf) r6 = r10
>    ; int load_static_data(struct __sk_buff *skb)
>       3: (07) r6 += -4
>    ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>       4: (18) r1 = map[id:66]
>       6: (bf) r2 = r6
>       7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>       9: (b7) r4 = 0
>      10: (85) call array_map_update_elem#99888
>      11: (b7) r1 = 1
>    ; key = 1;
>      12: (63) *(u32 *)(r10 -4) = r1
>    ; bpf_map_update_elem(&result, &key, &static_data, 0);
>      13: (18) r1 = map[id:66]
>      15: (bf) r2 = r6
>      16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>      18: (b7) r4 = 0
>      19: (85) call array_map_update_elem#99888
>      20: (b7) r1 = 2
>    ; key = 2;
>      21: (63) *(u32 *)(r10 -4) = r1
>    ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>      22: (18) r1 = map[id:66]
>      24: (bf) r2 = r6
>      25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>      27: (b7) r4 = 0
>      28: (85) call array_map_update_elem#99888
>      29: (b7) r1 = 3
>    ; key = 3;
>      30: (63) *(u32 *)(r10 -4) = r1
>    ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>      31: (18) r7 = map[id:63][0]+8         <--.
>      33: (18) r1 = map[id:66]                 |
>      35: (bf) r2 = r6                         |
>      36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>      38: (b7) r4 = 0
>      39: (85) call array_map_update_elem#99888
>    [...]
> 
> For now .data/.rodata/.bss maps are not exposed via API to the
> user, but this could be done in a subsequent step.
> 
> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> fail for static variables").
> 
> Joint work with Joe Stringer.
> 
>    [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>        http://vger.kernel.org/lpc-bpf2018.html#session-3
> 
> Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> Signed-off-by: Joe Stringer <joe@xxxxxxxxxxx>
> ---
>   tools/include/uapi/linux/bpf.h |  10 +-
>   tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>   2 files changed, 226 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 8884072e1a46..04b26f59b413 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -287,7 +287,7 @@ enum bpf_attach_type {
> [...]
> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>   			 (long long) (rel.r_info >> 32),
>   			 (long long) sym.st_value, sym.st_name);
>   
> -		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> -			pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> +		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> +		    sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> +		    sym.st_shndx != bss_shndx) {
> +			pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>   				   prog->section_name, sym.st_shndx);
>   			return -LIBBPF_ERRNO__RELOC;
>   		}
> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>   			prog->reloc_desc[i].type = RELO_LD64;
>   			prog->reloc_desc[i].insn_idx = insn_idx;
>   			prog->reloc_desc[i].map_idx = map_idx;
> +		} else if (sym.st_shndx == data_shndx ||
> +			   sym.st_shndx == rodata_shndx ||
> +			   sym.st_shndx == bss_shndx) {
> +			int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> +				   (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> +								    RELO_BSS;
> +
> +			for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> +				if (maps_global[map_idx].global_type == type) {
> +					pr_debug("relocation: find map %zd (%s) for insn %u\n",
> +						 map_idx, maps_global[map_idx].name, insn_idx);
> +					break;
> +				}
> +			}
> +
> +			if (map_idx >= nr_maps_global) {
> +				pr_warning("bpf relocation: map_idx %d large than %d\n",
> +					   (int)map_idx, (int)nr_maps_global - 1);
> +				return -LIBBPF_ERRNO__RELOC;
> +			}
> +
> +			prog->reloc_desc[i].type = type;
> +			prog->reloc_desc[i].insn_idx = insn_idx;
> +			prog->reloc_desc[i].map_idx = map_idx;
>   		}
>   	}
>   	return 0;
> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>   }
>   
>   static int
[...]
> +
> +static int
> +bpf_object__create_maps(struct bpf_object *obj)
> +{
>   	unsigned int i;
>   	int err;
>   
>   	for (i = 0; i < obj->nr_maps; i++) {
>   		struct bpf_map *map = &obj->maps[i];
> -		struct bpf_map_def *def = &map->def;
>   		char *cp, errmsg[STRERR_BUFSIZE];
>   		int *pfd = &map->fd;
>   
> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>   				 map->name, map->fd);
>   			continue;
>   		}
> -
> -		if (obj->caps.name)
> -			create_attr.name = map->name;
> -		create_attr.map_ifindex = map->map_ifindex;
> -		create_attr.map_type = def->type;
> -		create_attr.map_flags = def->map_flags;
> -		create_attr.key_size = def->key_size;
> -		create_attr.value_size = def->value_size;
> -		create_attr.max_entries = def->max_entries;
> -		create_attr.btf_fd = 0;
> -		create_attr.btf_key_type_id = 0;
> -		create_attr.btf_value_type_id = 0;
> -		if (bpf_map_type__is_map_in_map(def->type) &&
> -		    map->inner_map_fd >= 0)
> -			create_attr.inner_map_fd = map->inner_map_fd;
> -
> -		if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> -			create_attr.btf_fd = btf__fd(obj->btf);
> -			create_attr.btf_key_type_id = map->btf_key_type_id;
> -			create_attr.btf_value_type_id = map->btf_value_type_id;
> -		}
> -
> -		*pfd = bpf_create_map_xattr(&create_attr);
> -		if (*pfd < 0 && create_attr.btf_key_type_id) {
> -			cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> -			pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> -				   map->name, cp, errno);
> -			create_attr.btf_fd = 0;
> -			create_attr.btf_key_type_id = 0;
> -			create_attr.btf_value_type_id = 0;
> -			map->btf_key_type_id = 0;
> -			map->btf_value_type_id = 0;
> -			*pfd = bpf_create_map_xattr(&create_attr);
> -		}
> -
> +		*pfd = bpf_object__create_map(obj, map);
>   		if (*pfd < 0) {
>   			size_t j;
>   
> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>   						      &prog->reloc_desc[i]);
>   			if (err)
>   				return err;
> +		} else if (prog->reloc_desc[i].type == RELO_DATA ||
> +			   prog->reloc_desc[i].type == RELO_RODATA ||
> +			   prog->reloc_desc[i].type == RELO_BSS) {
> +			struct bpf_insn *insns = prog->insns;
> +			int insn_idx, map_idx, data_off;
> +
> +			insn_idx = prog->reloc_desc[i].insn_idx;
> +			map_idx  = prog->reloc_desc[i].map_idx;
> +			data_off = insns[insn_idx].imm;

I want to point to a subtle difference here between handling pure global 
variables and static global variables. The "imm" value is only available
for static variables. For example,

-bash-4.4$ cat g.c
static volatile long sg = 2;
static volatile int si = 3;
long g = 4;
int i = 5;
int test() { return sg + si + g + i; }
-bash-4.4$
-bash-4.4$ clang -target bpf -O2 -c g.c 

-bash-4.4$ readelf -s g.o 


Symbol table '.symtab' contains 8 entries:
    Num:    Value          Size Type    Bind   Vis      Ndx Name
      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
      1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
      2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
      3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
      4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
      5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
      6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
      7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
-bash-4.4$
-bash-4.4$ llvm-readelf -r g.o

Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
     Offset             Info             Type               Symbol's 
Value  Symbol's Name
0000000000000000  0000000400000001 R_BPF_64_64 
0000000000000000 .data
0000000000000018  0000000400000001 R_BPF_64_64 
0000000000000000 .data
0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
-bash-4.4$ llvm-objdump -d g.o

g.o:    file format ELF64-BPF

Disassembly of section .text:
0000000000000000 test:
        0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00 
r1 = 16 ll
        2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
        3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00 
r2 = 24 ll
        5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
        6:       0f 21 00 00 00 00 00 00         r1 += r2
        7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
r2 = 0 ll
        9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
       10:       0f 21 00 00 00 00 00 00         r1 += r2
       11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
r2 = 0 ll
       13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
       14:       0f 10 00 00 00 00 00 00         r0 += r1
       15:       95 00 00 00 00 00 00 00         exit
-bash-4.4$

You can see the above, the non-static global access does not have its
in-section offset encoded in the insn itself. The difference is due to
llvm treating static global and non-static global differently.

To support both cases, during relocation recording stage, you can
also record:
    . symbol binding (GELF_ST_BIND(sym.st_info)),
      non-static global has binding STB_GLOBAL and static
      global has binding STB_LOCAL
    . symbol value (sym.st_value)

During the above relocation resolution, if symbol bind is local, do
what you already did here. If symbol bind is global, assign data_off
with symbol value.

This applied to both .data and .rodata sections.

The non initialized
global variable will not be in any allocated section in ELF file,
it is in a COM section which is to be allocated by loader.
So user defines some like
    int g;
and later on uses it. Right now, it will not work. The workaround
is "int g = 4", or "static int g". I guess it should be
okay, we should encourage users to use "static" variables instead.

> +
> +			if (insn_idx + 1 >= (int)prog->insns_cnt) {
> +				pr_warning("relocation out of range: '%s'\n",
> +					   prog->section_name);
> +				return -LIBBPF_ERRNO__RELOC;
> +			}
> +			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> +			insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> +			insns[insn_idx + 1].imm = data_off;
>   		}
>   	}
>   
> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>   
>   	CHECK_ERR(bpf_object__elf_init(obj), err, out);
>   	CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> +	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>   	CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>   	CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>   	CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>   
>   	for (i = 0; i < obj->nr_maps; i++)
>   		zclose(obj->maps[i].fd);
> -
> +	for (i = 0; i < obj->nr_maps_global; i++)
> +		zclose(obj->maps_global[i].fd);
>   	for (i = 0; i < obj->nr_programs; i++)
>   		bpf_program__unload(&obj->programs[i]);
>   
> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>   
>   	obj->loaded = true;
>   
> -	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>   	CHECK_ERR(bpf_object__create_maps(obj), err, out);
>   	CHECK_ERR(bpf_object__relocate(obj), err, out);
>   	CHECK_ERR(bpf_object__load_progs(obj), err, out);
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux