On Wed, Mar 31, 2021 at 7:58 PM Yonghong Song <yhs@xxxxxx> wrote: > > Commit 39227909db3c checked compilation flags to see > whether the binary is built with lto or not (-flto). > Currently, for clang lto build, default setting > won't put compilation flags in dwarf due to size > concern. > > David Blaikie suggested in [1] to scan through .debug_abbrev > for DW_FORM_ref_addr which should be most faster than > scanning through cu's. This patch implemented this > suggestion and replaced the previous compilation flag > matching approach. Indeed, it seems that the overhead > for this approach is indeed managable. > > I did some change to measure the overhead of cus_merging_cu(): > @@ -2650,7 +2652,15 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf, > } > } > > - if (cus__merging_cu(dw)) { > + bool do_merging; > + struct timeval start, end; > + gettimeofday(&start, NULL); > + do_merging = cus__merging_cu(dw); > + gettimeofday(&end, NULL); > + fprintf(stderr, "%ld %ld -> %ld %ld\n", start.tv_sec, start.tv_usec, > + end.tv_sec, end.tv_usec); > + > + if (do_merging) { > res = cus__merge_and_process_cu(cus, conf, mod, dw, elf, filename, > build_id, build_id_len, > type_cu ? &type_dcu : NULL); > > For lto vmlinux, the cus__merging_cu() checking takes > 130us over total "pahole -J vmlinux" time 65sec as the function bail out > earlier due to detecting a merging cu condition. > For non-lto vmlinux, the cus__merging_cu() checking takes > ~171368us over total pahole time 36sec, roughly 0.5% overhead. > > [1] https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@xxxxxx/ > It might be a nice little touch to add: Suggested-by: David Blaikie <blaikie@xxxxxxxxxx> > Signed-off-by: Yonghong Song <yhs@xxxxxx> > --- > dwarf_loader.c | 43 ++++++++++++++++++++++++------------------- > 1 file changed, 24 insertions(+), 19 deletions(-) > > diff --git a/dwarf_loader.c b/dwarf_loader.c > index c1ca1a3..bd23751 100644 > --- a/dwarf_loader.c > +++ b/dwarf_loader.c > @@ -2503,35 +2503,40 @@ static int cus__load_debug_types(struct cus *cus, struct conf_load *conf, > > static bool cus__merging_cu(Dwarf *dw) > { > - uint8_t pointer_size, offset_size; > Dwarf_Off off = 0, noff; > size_t cuhl; > - int cnt = 0; > > - /* > - * Just checking the first cu is not enough. > - * In linux, some C files may have LTO is disabled, e.g., > - * e242db40be27 x86, vdso: disable LTO only for vDSO > - * d2dcd3e37475 x86, cpu: disable LTO for cpu.c > - * Fortunately, disabling LTO for a particular file in a LTO build > - * is rather an exception. Iterating 5 cu's to check whether > - * LTO is used or not should be enough. > - */ > - while (dwarf_nextcu(dw, off, &noff, &cuhl, NULL, &pointer_size, > - &offset_size) == 0) { > + while (dwarf_nextcu (dw, off, &noff, &cuhl, NULL, NULL, NULL) == 0) { > Dwarf_Die die_mem; > Dwarf_Die *cu_die = dwarf_offdie(dw, off + cuhl, &die_mem); > > if (cu_die == NULL) > break; > > - if (++cnt > 5) > - break; > + Dwarf_Off offset = 0; > + while (true) { > + size_t length; > + Dwarf_Abbrev *abbrev = dwarf_getabbrev (cu_die, offset, &length); > + if (abbrev == NULL || abbrev == DWARF_END_ABBREV) > + break; > > - const char *producer = attr_string(cu_die, DW_AT_producer); > - if (strstr(producer, "clang version") != NULL && > - strstr(producer, "-flto") != NULL) > - return true; > + size_t attrcnt; > + if (dwarf_getattrcnt (abbrev, &attrcnt) != 0) > + return false; > + > + unsigned int attr_num, attr_form; > + Dwarf_Off aboffset; > + size_t j; > + for (j = 0; j < attrcnt; ++j) { > + if (dwarf_getabbrevattr (abbrev, j, &attr_num, &attr_form, > + &aboffset)) > + return false; > + if (attr_form == DW_FORM_ref_addr) > + return true; > + } > + > + offset += length; > + } > > off = noff; > } > -- > 2.30.2 > -- Thanks, ~Nick Desaulniers