This is a v2 of the patchset aiming to speed up parallel BTF encoding when reproducible_build flag is set (see link [1]). In comparison to v1: * patch #2 adding section-relative addresses to elf_functions is removed as unrelated [2] * patch #9 [3] is replaced with patches #8, #9 and #10 (the biggest and most important in this series) Patch #10 rewrites multithreading implementation to job/worker model. See the details in the commit message. The ./tests/tests pass with a vmlinux build on bpf-next. I also confrimed that the reproducible bpftool dump of BTF produced for vmlinux is identical between this patch series and pahole/next. With this patch series, the performance of parallel BTF encoding is comparable to non-reproducible runs on pahole/next. Depending on the number of threads and allowed memory usage (indirectly controlled by max_decoded_cus parameter of the queue in the dwarf_loader.c), it may be a little slower or a little faster. Note that the number of CPU cycles is significantly less, although the wall-clock time is somewhat greater for -j24, as reported by perf. See sample measurements below (host nproc=24). This patch (always reproducible) -j1 mem 842020 Kb, time 6.31 sec -j3 mem 864604 Kb, time 2.90 sec -j6 mem 927760 Kb, time 2.21 sec -j12 mem 1026616 Kb, time 2.29 sec -j24 mem 1188448 Kb, time 2.36 sec -j48 mem 1462656 Kb, time 2.48 sec Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs): 46,771,092,586 cycles:u ( +- 0.17% ) 2.36785 +- 0.00503 seconds time elapsed ( +- 0.21% ) pahole/next (1cb4202) non-reproducible -j1 mem 834004 Kb, time 6.25 sec -j3 mem 976480 Kb, time 3.21 sec -j6 mem 1081432 Kb, time 2.36 sec -j12 mem 1161252 Kb, time 2.07 sec -j24 mem 1303060 Kb, time 2.13 sec -j48 mem 1537800 Kb, time 2.39 sec Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs): 60,436,382,442 cycles:u ( +- 0.22% ) 2.2024 +- 0.0151 seconds time elapsed ( +- 0.68% ) pahole/next (1cb4202) reproducible -j1 mem 4745764 Kb, time 7.64 sec -j3 mem 4744556 Kb, time 3.95 sec -j6 mem 4744592 Kb, time 2.98 sec -j12 mem 4744680 Kb, time 2.99 sec -j24 mem 4745252 Kb, time 2.99 sec -j48 mem 4744520 Kb, time 2.98 sec Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs): 38,155,725,721 cycles:u ( +- 0.29% ) 3.00290 +- 0.00501 seconds time elapsed ( +- 0.17% ) [1] https://lore.kernel.org/dwarves/20241128012341.4081072-1-ihor.solodrai@xxxxx/ [2] https://lore.kernel.org/dwarves/20241128012341.4081072-3-ihor.solodrai@xxxxx/ [3] https://lore.kernel.org/dwarves/20241128012341.4081072-10-ihor.solodrai@xxxxx/ Alan Maguire (2): btf_encoder: simplify function encoding btf_encoder: separate elf function, saved function representations Ihor Solodrai (8): dwarf_loader: introduce pre_load_module hook to conf_load btf_encoder: introduce elf_functions struct type btf_encoder: collect elf_functions in btf_encoder__pre_load_module btf_encoder: switch to shared elf_functions table btf_encoder: introduce btf_encoding_context btf_encoder: remove skip_encoding_inconsistent_proto dwarf_loader: introduce cu->id dwarf_loader: multithreading with a job/worker model btf_encoder.c | 639 +++++++++++++++++++++--------------- btf_encoder.h | 8 +- btf_loader.c | 2 +- ctf_loader.c | 2 +- dwarf_loader.c | 352 ++++++++++++++------ dwarves.c | 44 --- dwarves.h | 21 +- pahole.c | 237 +++---------- pdwtags.c | 3 +- pfunct.c | 3 +- tests/reproducible_build.sh | 5 +- 11 files changed, 685 insertions(+), 631 deletions(-) -- 2.47.1