On Thu, Jun 22, 2023 at 05:31:58PM -0700, Andrii Nakryiko wrote: > On Tue, Jun 20, 2023 at 1:36 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > > Adding elf symbol iterator object (and some functions) that follow > > open-coded iterator pattern and some functions to ease up iterating > > elf object symbols. > > > > The idea is to iterate single symbol section with: > > > > struct elf_symbol_iter iter; > > struct elf_symbol *sym; > > > > if (elf_symbol_iter_new(&iter, elf, binary_path, SHT_DYNSYM)) > > goto error; > > > > while ((sym = elf_symbol_iter_next(&iter))) { > > ... > > } > > > > I considered opening the elf inside the iterator and iterate all symbol > > sections, but then it gets more complicated wrt user checks for when > > the next section is processed. > > > > Plus side is the we don't need 'exit' function, because caller/user is > > in charge of that. > > > > The returned iterated symbol object from elf_symbol_iter_next function > > is placed inside the struct elf_symbol_iter, so no extra allocation or > > argument is needed. > > > > Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > > --- > > tools/lib/bpf/libbpf.c | 179 ++++++++++++++++++++++++++--------------- > > 1 file changed, 114 insertions(+), 65 deletions(-) > > > > This is great. Left a few nits below. I'm thinkin maybe we should add > a separate elf.c file for all these ELF-related helpers and start > offloading code from libbpf.c, which got pretty big already. WDYT? yes, I thought doing the move after this is merged might be better, because it's quite big already > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > index af52188daa80..cdac368c7ce1 100644 > > --- a/tools/lib/bpf/libbpf.c > > +++ b/tools/lib/bpf/libbpf.c > > @@ -10824,6 +10824,109 @@ static Elf_Scn *elf_find_next_scn_by_type(Elf *elf, int sh_type, Elf_Scn *scn) > > return NULL; > > } > > > > +struct elf_symbol { > > + const char *name; > > + unsigned long offset; > > + int bind; > > +}; > > + > > +struct elf_symbol_iter { > > naming nits: elf_sym and elf_sym_iter? keep it short, keep it cool :) ok > > > + Elf *elf; > > + Elf_Data *symbols; > > syms :-P ook ;-) > > > + size_t nr_syms; > > + size_t strtabidx; > > + size_t idx; > > next_sym_idx? ok > > > + struct elf_symbol sym; > > +}; > > + > > +static int elf_symbol_iter_new(struct elf_symbol_iter *iter, > > + Elf *elf, const char *binary_path, > > + int sh_type) > > +{ > > + Elf_Scn *scn = NULL; > > + GElf_Ehdr ehdr; > > + GElf_Shdr sh; > > + > > + memset(iter, 0, sizeof(*iter)); > > + > > + if (!gelf_getehdr(elf, &ehdr)) { > > + pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1)); > > + return -LIBBPF_ERRNO__FORMAT; > > + } > > + > > + scn = elf_find_next_scn_by_type(elf, sh_type, NULL); > > + if (!scn) { > > + pr_debug("elf: failed to find symbol table ELF sections in '%s'\n", > > + binary_path); > > + return -EINVAL; > > + } > > + > > + if (!gelf_getshdr(scn, &sh)) > > + return -EINVAL; > > + > > + iter->strtabidx = sh.sh_link; > > + iter->symbols = elf_getdata(scn, 0); > > + if (!iter->symbols) { > > + pr_warn("elf: failed to get symbols for symtab section in '%s': %s\n", > > + binary_path, elf_errmsg(-1)); > > + return -LIBBPF_ERRNO__FORMAT; > > + } > > + iter->nr_syms = iter->symbols->d_size / sh.sh_entsize; > > + iter->elf = elf; > > + return 0; > > +} > > + > > +static struct elf_symbol *elf_symbol_iter_next(struct elf_symbol_iter *iter) > > +{ > > + struct elf_symbol *ret = &iter->sym; > > + unsigned long offset = 0; > > + const char *name = NULL; > > + GElf_Shdr sym_sh; > > + Elf_Scn *sym_scn; > > + GElf_Sym sym; > > + size_t idx; > > + > > + for (idx = iter->idx; idx < iter->nr_syms; idx++) { > > + if (!gelf_getsym(iter->symbols, idx, &sym)) > > + continue; > > + if (GELF_ST_TYPE(sym.st_info) != STT_FUNC) > > + continue; > > it would be more generic if this symbol type filter was a parameter to > iterator, instead of hard-coding it? ok > > > + name = elf_strptr(iter->elf, iter->strtabidx, sym.st_name); > > + if (!name) > > + continue; > > + > > + /* Transform symbol's virtual address (absolute for > > + * binaries and relative for shared libs) into file > > + * offset, which is what kernel is expecting for > > + * uprobe/uretprobe attachment. > > + * See Documentation/trace/uprobetracer.rst for more > > + * details. > > + * This is done by looking up symbol's containing > > + * section's header and using iter's virtual address > > + * (sh_addr) and corresponding file offset (sh_offset) > > + * to transform sym.st_value (virtual address) into > > + * desired final file offset. > > + */ > > + sym_scn = elf_getscn(iter->elf, sym.st_shndx); > > + if (!sym_scn) > > + continue; > > + if (!gelf_getshdr(sym_scn, &sym_sh)) > > + continue; > > + > > + offset = sym.st_value - sym_sh.sh_addr + sym_sh.sh_offset; > > I think this part is not really generic "let's iterate ELF symbols", > maybe let users of iterator do this? We can have a helper to do > translation if we need to do it in few different places. yes this will be called in all the places we use the iterator, I'll add the helper for it > > > + break; > > + } > > + > > + /* we reached the last symbol */ > > + if (idx == iter->nr_syms) > > + return NULL; > > + iter->idx = idx + 1; > > + ret->name = name; > > + ret->bind = GELF_ST_BIND(sym.st_info); > > + ret->offset = offset; > > Why not just return entire GElf_Sym information and let user process > it as desired. So basically for each symbol you'll give back its name, > GElf_Sym info, and I'd return symbol index as well. That will keep > this very generic for future uses. ok, so you have other users of this iterator in mind already? > > > + return ret; > > I'd structure this a bit different. If we got out of loop, just return > NULL. Then inside the for loop, when we found the symbol, fill out ret > and return from inside the for loop. I think it's more > straightforward. ok, will change thanks, jirka > > > +} > > + > > /* Find offset of function name in the provided ELF object. "binary_path" is > > * the path to the ELF binary represented by "elf", and only used for error > > * reporting matters. "name" matches symbol name or name@@LIB for library > > [...]