Commit 39227909db3c checked compilation flags to see whether the binary is built with lto or not (-flto). Currently, for clang lto build, default setting won't put compilation flags in dwarf due to size concern. David Blaikie suggested in [1] to scan through .debug_abbrev for DW_FORM_ref_addr which should be most faster than scanning through cu's. This patch implemented this suggestion and replaced the previous compilation flag matching approach. Indeed, it seems that the overhead for this approach is indeed managable. I did some change to measure the overhead of cus_merging_cu(): @@ -2650,7 +2652,15 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf, } } - if (cus__merging_cu(dw)) { + bool do_merging; + struct timeval start, end; + gettimeofday(&start, NULL); + do_merging = cus__merging_cu(dw); + gettimeofday(&end, NULL); + fprintf(stderr, "%ld %ld -> %ld %ld\n", start.tv_sec, start.tv_usec, + end.tv_sec, end.tv_usec); + + if (do_merging) { res = cus__merge_and_process_cu(cus, conf, mod, dw, elf, filename, build_id, build_id_len, type_cu ? &type_dcu : NULL); For lto vmlinux, the cus__merging_cu() checking takes 130us over total "pahole -J vmlinux" time 65sec as the function bail out earlier due to detecting a merging cu condition. For non-lto vmlinux, the cus__merging_cu() checking takes ~171368us over total pahole time 36sec, roughly 0.5% overhead. [1] https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@xxxxxx/ Signed-off-by: Yonghong Song <yhs@xxxxxx> --- dwarf_loader.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/dwarf_loader.c b/dwarf_loader.c index c1ca1a3..bd23751 100644 --- a/dwarf_loader.c +++ b/dwarf_loader.c @@ -2503,35 +2503,40 @@ static int cus__load_debug_types(struct cus *cus, struct conf_load *conf, static bool cus__merging_cu(Dwarf *dw) { - uint8_t pointer_size, offset_size; Dwarf_Off off = 0, noff; size_t cuhl; - int cnt = 0; - /* - * Just checking the first cu is not enough. - * In linux, some C files may have LTO is disabled, e.g., - * e242db40be27 x86, vdso: disable LTO only for vDSO - * d2dcd3e37475 x86, cpu: disable LTO for cpu.c - * Fortunately, disabling LTO for a particular file in a LTO build - * is rather an exception. Iterating 5 cu's to check whether - * LTO is used or not should be enough. - */ - while (dwarf_nextcu(dw, off, &noff, &cuhl, NULL, &pointer_size, - &offset_size) == 0) { + while (dwarf_nextcu (dw, off, &noff, &cuhl, NULL, NULL, NULL) == 0) { Dwarf_Die die_mem; Dwarf_Die *cu_die = dwarf_offdie(dw, off + cuhl, &die_mem); if (cu_die == NULL) break; - if (++cnt > 5) - break; + Dwarf_Off offset = 0; + while (true) { + size_t length; + Dwarf_Abbrev *abbrev = dwarf_getabbrev (cu_die, offset, &length); + if (abbrev == NULL || abbrev == DWARF_END_ABBREV) + break; - const char *producer = attr_string(cu_die, DW_AT_producer); - if (strstr(producer, "clang version") != NULL && - strstr(producer, "-flto") != NULL) - return true; + size_t attrcnt; + if (dwarf_getattrcnt (abbrev, &attrcnt) != 0) + return false; + + unsigned int attr_num, attr_form; + Dwarf_Off aboffset; + size_t j; + for (j = 0; j < attrcnt; ++j) { + if (dwarf_getabbrevattr (abbrev, j, &attr_num, &attr_form, + &aboffset)) + return false; + if (attr_form == DW_FORM_ref_addr) + return true; + } + + offset += length; + } off = noff; } -- 2.30.2