Re: [PATCHv2 bpf-next 06/24] libbpf: Add elf symbol iterator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 20, 2023 at 1:36 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> Adding elf symbol iterator object (and some functions) that follow
> open-coded iterator pattern and some functions to ease up iterating
> elf object symbols.
>
> The idea is to iterate single symbol section with:
>
>   struct elf_symbol_iter iter;
>   struct elf_symbol *sym;
>
>   if (elf_symbol_iter_new(&iter, elf, binary_path, SHT_DYNSYM))
>         goto error;
>
>   while ((sym = elf_symbol_iter_next(&iter))) {
>         ...
>   }
>
> I considered opening the elf inside the iterator and iterate all symbol
> sections, but then it gets more complicated wrt user checks for when
> the next section is processed.
>
> Plus side is the we don't need 'exit' function, because caller/user is
> in charge of that.
>
> The returned iterated symbol object from elf_symbol_iter_next function
> is placed inside the struct elf_symbol_iter, so no extra allocation or
> argument is needed.
>
> Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> ---
>  tools/lib/bpf/libbpf.c | 179 ++++++++++++++++++++++++++---------------
>  1 file changed, 114 insertions(+), 65 deletions(-)
>

This is great. Left a few nits below. I'm thinkin maybe we should add
a separate elf.c file for all these ELF-related helpers and start
offloading code from libbpf.c, which got pretty big already. WDYT?


> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index af52188daa80..cdac368c7ce1 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -10824,6 +10824,109 @@ static Elf_Scn *elf_find_next_scn_by_type(Elf *elf, int sh_type, Elf_Scn *scn)
>         return NULL;
>  }
>
> +struct elf_symbol {
> +       const char *name;
> +       unsigned long offset;
> +       int bind;
> +};
> +
> +struct elf_symbol_iter {

naming nits: elf_sym and elf_sym_iter? keep it short, keep it cool :)

> +       Elf *elf;
> +       Elf_Data *symbols;

syms :-P

> +       size_t nr_syms;
> +       size_t strtabidx;
> +       size_t idx;

next_sym_idx?

> +       struct elf_symbol sym;
> +};
> +
> +static int elf_symbol_iter_new(struct elf_symbol_iter *iter,
> +                              Elf *elf, const char *binary_path,
> +                              int sh_type)
> +{
> +       Elf_Scn *scn = NULL;
> +       GElf_Ehdr ehdr;
> +       GElf_Shdr sh;
> +
> +       memset(iter, 0, sizeof(*iter));
> +
> +       if (!gelf_getehdr(elf, &ehdr)) {
> +               pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1));
> +               return -LIBBPF_ERRNO__FORMAT;
> +       }
> +
> +       scn = elf_find_next_scn_by_type(elf, sh_type, NULL);
> +       if (!scn) {
> +               pr_debug("elf: failed to find symbol table ELF sections in '%s'\n",
> +                        binary_path);
> +               return -EINVAL;
> +       }
> +
> +       if (!gelf_getshdr(scn, &sh))
> +               return -EINVAL;
> +
> +       iter->strtabidx = sh.sh_link;
> +       iter->symbols = elf_getdata(scn, 0);
> +       if (!iter->symbols) {
> +               pr_warn("elf: failed to get symbols for symtab section in '%s': %s\n",
> +                       binary_path, elf_errmsg(-1));
> +               return -LIBBPF_ERRNO__FORMAT;
> +       }
> +       iter->nr_syms = iter->symbols->d_size / sh.sh_entsize;
> +       iter->elf = elf;
> +       return 0;
> +}
> +
> +static struct elf_symbol *elf_symbol_iter_next(struct elf_symbol_iter *iter)
> +{
> +       struct elf_symbol *ret = &iter->sym;
> +       unsigned long offset = 0;
> +       const char *name = NULL;
> +       GElf_Shdr sym_sh;
> +       Elf_Scn *sym_scn;
> +       GElf_Sym sym;
> +       size_t idx;
> +
> +       for (idx = iter->idx; idx < iter->nr_syms; idx++) {
> +               if (!gelf_getsym(iter->symbols, idx, &sym))
> +                       continue;
> +               if (GELF_ST_TYPE(sym.st_info) != STT_FUNC)
> +                       continue;

it would be more generic if this symbol type filter was a parameter to
iterator, instead of hard-coding it?

> +               name = elf_strptr(iter->elf, iter->strtabidx, sym.st_name);
> +               if (!name)
> +                       continue;
> +
> +               /* Transform symbol's virtual address (absolute for
> +                * binaries and relative for shared libs) into file
> +                * offset, which is what kernel is expecting for
> +                * uprobe/uretprobe attachment.
> +                * See Documentation/trace/uprobetracer.rst for more
> +                * details.
> +                * This is done by looking up symbol's containing
> +                * section's header and using iter's virtual address
> +                * (sh_addr) and corresponding file offset (sh_offset)
> +                * to transform sym.st_value (virtual address) into
> +                * desired final file offset.
> +                */
> +               sym_scn = elf_getscn(iter->elf, sym.st_shndx);
> +               if (!sym_scn)
> +                       continue;
> +               if (!gelf_getshdr(sym_scn, &sym_sh))
> +                       continue;
> +
> +               offset = sym.st_value - sym_sh.sh_addr + sym_sh.sh_offset;

I think this part is not really generic "let's iterate ELF symbols",
maybe let users of iterator do this? We can have a helper to do
translation if we need to do it in few different places.

> +               break;
> +       }
> +
> +       /* we reached the last symbol */
> +       if (idx == iter->nr_syms)
> +               return NULL;
> +       iter->idx = idx + 1;
> +       ret->name = name;
> +       ret->bind = GELF_ST_BIND(sym.st_info);
> +       ret->offset = offset;

Why not just return entire GElf_Sym information and let user process
it as desired. So basically for each symbol you'll give back its name,
GElf_Sym info, and I'd return symbol index as well. That will keep
this very generic for future uses.

> +       return ret;

I'd structure this a bit different. If we got out of loop, just return
NULL. Then inside the for loop, when we found the symbol, fill out ret
and return from inside the for loop. I think it's more
straightforward.

> +}
> +
>  /* Find offset of function name in the provided ELF object. "binary_path" is
>   * the path to the ELF binary represented by "elf", and only used for error
>   * reporting matters. "name" matches symbol name or name@@LIB for library

[...]





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux