On Fri, Jun 23, 2023 at 1:19 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Thu, Jun 22, 2023 at 05:31:58PM -0700, Andrii Nakryiko wrote: > > On Tue, Jun 20, 2023 at 1:36 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > > > > Adding elf symbol iterator object (and some functions) that follow > > > open-coded iterator pattern and some functions to ease up iterating > > > elf object symbols. > > > > > > The idea is to iterate single symbol section with: > > > > > > struct elf_symbol_iter iter; > > > struct elf_symbol *sym; > > > > > > if (elf_symbol_iter_new(&iter, elf, binary_path, SHT_DYNSYM)) > > > goto error; > > > > > > while ((sym = elf_symbol_iter_next(&iter))) { > > > ... > > > } > > > > > > I considered opening the elf inside the iterator and iterate all symbol > > > sections, but then it gets more complicated wrt user checks for when > > > the next section is processed. > > > > > > Plus side is the we don't need 'exit' function, because caller/user is > > > in charge of that. > > > > > > The returned iterated symbol object from elf_symbol_iter_next function > > > is placed inside the struct elf_symbol_iter, so no extra allocation or > > > argument is needed. > > > > > > Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > > > --- > > > tools/lib/bpf/libbpf.c | 179 ++++++++++++++++++++++++++--------------- > > > 1 file changed, 114 insertions(+), 65 deletions(-) > > > > > > > This is great. Left a few nits below. I'm thinkin maybe we should add > > a separate elf.c file for all these ELF-related helpers and start > > offloading code from libbpf.c, which got pretty big already. WDYT? > > yes, I thought doing the move after this is merged might be better, > because it's quite big already > > > > > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > > index af52188daa80..cdac368c7ce1 100644 > > > --- a/tools/lib/bpf/libbpf.c > > > +++ b/tools/lib/bpf/libbpf.c > > > @@ -10824,6 +10824,109 @@ static Elf_Scn *elf_find_next_scn_by_type(Elf *elf, int sh_type, Elf_Scn *scn) > > > return NULL; > > > } > > > > > > +struct elf_symbol { > > > + const char *name; > > > + unsigned long offset; > > > + int bind; > > > +}; > > > + > > > +struct elf_symbol_iter { > > > > naming nits: elf_sym and elf_sym_iter? keep it short, keep it cool :) > > ok > > > > > > + Elf *elf; > > > + Elf_Data *symbols; > > > > syms :-P > > ook ;-) > > > > > > + size_t nr_syms; > > > + size_t strtabidx; > > > + size_t idx; > > > > next_sym_idx? > > ok > > > > > > + struct elf_symbol sym; > > > +}; > > > + > > > +static int elf_symbol_iter_new(struct elf_symbol_iter *iter, > > > + Elf *elf, const char *binary_path, > > > + int sh_type) > > > +{ > > > + Elf_Scn *scn = NULL; > > > + GElf_Ehdr ehdr; > > > + GElf_Shdr sh; > > > + > > > + memset(iter, 0, sizeof(*iter)); > > > + > > > + if (!gelf_getehdr(elf, &ehdr)) { > > > + pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1)); > > > + return -LIBBPF_ERRNO__FORMAT; > > > + } > > > + > > > + scn = elf_find_next_scn_by_type(elf, sh_type, NULL); > > > + if (!scn) { > > > + pr_debug("elf: failed to find symbol table ELF sections in '%s'\n", > > > + binary_path); > > > + return -EINVAL; > > > + } > > > + > > > + if (!gelf_getshdr(scn, &sh)) > > > + return -EINVAL; > > > + > > > + iter->strtabidx = sh.sh_link; > > > + iter->symbols = elf_getdata(scn, 0); > > > + if (!iter->symbols) { > > > + pr_warn("elf: failed to get symbols for symtab section in '%s': %s\n", > > > + binary_path, elf_errmsg(-1)); > > > + return -LIBBPF_ERRNO__FORMAT; > > > + } > > > + iter->nr_syms = iter->symbols->d_size / sh.sh_entsize; > > > + iter->elf = elf; > > > + return 0; > > > +} > > > + > > > +static struct elf_symbol *elf_symbol_iter_next(struct elf_symbol_iter *iter) > > > +{ > > > + struct elf_symbol *ret = &iter->sym; > > > + unsigned long offset = 0; > > > + const char *name = NULL; > > > + GElf_Shdr sym_sh; > > > + Elf_Scn *sym_scn; > > > + GElf_Sym sym; > > > + size_t idx; > > > + > > > + for (idx = iter->idx; idx < iter->nr_syms; idx++) { > > > + if (!gelf_getsym(iter->symbols, idx, &sym)) > > > + continue; > > > + if (GELF_ST_TYPE(sym.st_info) != STT_FUNC) > > > + continue; > > > > it would be more generic if this symbol type filter was a parameter to > > iterator, instead of hard-coding it? > > ok > > > > > > + name = elf_strptr(iter->elf, iter->strtabidx, sym.st_name); > > > + if (!name) > > > + continue; > > > + > > > + /* Transform symbol's virtual address (absolute for > > > + * binaries and relative for shared libs) into file > > > + * offset, which is what kernel is expecting for > > > + * uprobe/uretprobe attachment. > > > + * See Documentation/trace/uprobetracer.rst for more > > > + * details. > > > + * This is done by looking up symbol's containing > > > + * section's header and using iter's virtual address > > > + * (sh_addr) and corresponding file offset (sh_offset) > > > + * to transform sym.st_value (virtual address) into > > > + * desired final file offset. > > > + */ > > > + sym_scn = elf_getscn(iter->elf, sym.st_shndx); > > > + if (!sym_scn) > > > + continue; > > > + if (!gelf_getshdr(sym_scn, &sym_sh)) > > > + continue; > > > + > > > + offset = sym.st_value - sym_sh.sh_addr + sym_sh.sh_offset; > > > > I think this part is not really generic "let's iterate ELF symbols", > > maybe let users of iterator do this? We can have a helper to do > > translation if we need to do it in few different places. > > yes this will be called in all the places we use the iterator, > I'll add the helper for it > > > > > > + break; > > > + } > > > + > > > + /* we reached the last symbol */ > > > + if (idx == iter->nr_syms) > > > + return NULL; > > > + iter->idx = idx + 1; > > > + ret->name = name; > > > + ret->bind = GELF_ST_BIND(sym.st_info); > > > + ret->offset = offset; > > > > Why not just return entire GElf_Sym information and let user process > > it as desired. So basically for each symbol you'll give back its name, > > GElf_Sym info, and I'd return symbol index as well. That will keep > > this very generic for future uses. > > ok, so you have other users of this iterator in mind already? well, there is linker.c that also iterates ELF symbols, though that one is assuming Elf64_Sym, so I wouldn't go updating it. So it's more of a general feeling that "ELF symbol iterator" shouldn't assume functions and func_offset translation, it should just return symbols. > > > > > > + return ret; > > > > I'd structure this a bit different. If we got out of loop, just return > > NULL. Then inside the for loop, when we found the symbol, fill out ret > > and return from inside the for loop. I think it's more > > straightforward. > > ok, will change > > thanks, > jirka > > > > > > +} > > > + > > > /* Find offset of function name in the provided ELF object. "binary_path" is > > > * the path to the ELF binary represented by "elf", and only used for error > > > * reporting matters. "name" matches symbol name or name@@LIB for library > > > > [...]