On Tue, Apr 02, 2024 at 04:39:33PM -0300, Arnaldo Carvalho de Melo wrote: > Hi, > > This allows us to have reproducible builds while keeping the > DWARF loading phase in parallel, achieving a noticeable speedup as > showed in the commit log messages: > > On a: > > model name : Intel(R) Core(TM) i7-14700K > > 8 performance cores (16 threads), 12 efficiency cores. > > Serial encoding: > > $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux > 5.18276 +- 0.00952 seconds time elapsed ( +- 0.18% ) > > Parallel, but non-reproducible: > > $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux > 1.8529 +- 0.0159 seconds time elapsed ( +- 0.86% ) > > reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding: > > $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux > 2.3632 +- 0.0164 seconds time elapsed ( +- 0.69% ) hm, got it even faster than paralel build ;-) but it's within the 1 second deviation, I guess it shows better on bigger kernels reproducible_build: [root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole -j --reproducible_build --btf_encode_detached=btf.2 ./vmlinux Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole -j --reproducible_build --btf_encode_detached=btf.2 ./vmlinux' (3 runs): 295,519,117,258 cycles ( +- 19.48% ) 9.43 +- 1.02 seconds time elapsed ( +- 10.84% ) paralel build: [root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole -j --btf_encode_detached=btf.2 ./vmlinux Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole -j --btf_encode_detached=btf.2 ./vmlinux' (3 runs): 391,320,973,331 cycles ( +- 19.19% ) 9.851 +- 0.695 seconds time elapsed ( +- 7.06% ) 1 cpu: [root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole --btf_encode_detached=btf.2 ./vmlinux Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole --btf_encode_detached=btf.2 ./vmlinux' (3 runs): 208,492,342,135 cycles ( +- 19.43% ) 16.761 +- 0.644 seconds time elapsed ( +- 3.84% ) jirka > > Please take a look, its in the 'next' branch at: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next > > There is a new tool to do regression testing on this feature: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630 > > And here a more detailed set of tests using it: > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7 > > Working on libbpf to allow for parallel reproducible BTF encoding is the > next step. > > Thanks a lot, > > - Arnaldo > > Arnaldo Carvalho de Melo (12): > core: Allow asking for a reproducible build > pahole: Disable BTF multithreaded encoded when doing reproducible builds > dwarf_loader: Separate creating the cu/dcu pair from processing it > dwarf_loader: Introduce dwarf_cus__process_cu() > dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() > dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() > core: Add unlocked cus__add() variant > core: Add cus__remove(), counterpart of cus__add() > dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE > core/dwarf_loader: Add functions to set state of CU processing > pahole: Encode BTF serially in a reproducible build > tests: Add a BTF reproducible generation test > > dwarf_loader.c | 73 +++++++++++++++++++++++--------- > dwarves.c | 58 ++++++++++++++++++++++++- > dwarves.h | 17 ++++++++ > pahole.c | 84 +++++++++++++++++++++++++++++++++++-- > tests/reproducible_build.sh | 56 +++++++++++++++++++++++++ > 5 files changed, 264 insertions(+), 24 deletions(-) > create mode 100755 tests/reproducible_build.sh > > -- > 2.44.0 >