On 04/04/2024 09:58, Alan Maguire wrote: > On 02/04/2024 20:39, Arnaldo Carvalho de Melo wrote: >> Hi, >> >> This allows us to have reproducible builds while keeping the >> DWARF loading phase in parallel, achieving a noticeable speedup as >> showed in the commit log messages: >> >> On a: >> >> model name : Intel(R) Core(TM) i7-14700K >> >> 8 performance cores (16 threads), 12 efficiency cores. >> >> Serial encoding: >> >> $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux >> 5.18276 +- 0.00952 seconds time elapsed ( +- 0.18% ) >> >> Parallel, but non-reproducible: >> >> $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux >> 1.8529 +- 0.0159 seconds time elapsed ( +- 0.86% ) >> >> reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding: >> >> $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux >> 2.3632 +- 0.0164 seconds time elapsed ( +- 0.69% ) >> >> Please take a look, its in the 'next' branch at: >> >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next >> >> There is a new tool to do regression testing on this feature: >> >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630 >> >> And here a more detailed set of tests using it: >> >> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7 >> >> Working on libbpf to allow for parallel reproducible BTF encoding is the >> next step. >> >> Thanks a lot, >> > > Hey Arnaldo > > In testing this series I've hit a segmentation fault: > > Using host libthread_db library "/usr/lib64/libthread_db.so.1". > Core was generated by `pahole -J --btf_features=all --reproducible_build > -j vmlinux'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > at /home/almagui/src/dwarves/dwarves.c:612 > 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] > (gdb) bt > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > at /home/almagui/src/dwarves/dwarves.c:612 > #1 0x00007f8c8260ada2 in cu__type (cu=0x7f8c60001e40, id=77) > at /home/almagui/src/dwarves/dwarves.c:806 > #2 0x00007f8c8261342c in ftype__fprintf (ftype=0x7f8c60272f30, > cu=0x7f8c60001e40, name=0x0, inlined=0, is_pointer=0, type_spacing=0, > is_prototype=true, conf=0x7f8c653ff930, fp=0x7f8c3804bc90) > at /home/almagui/src/dwarves/dwarves_fprintf.c:1388 > #3 0x00007f8c8261289d in function__prototype_conf (func=0x7f8c60272f30, > cu=0x7f8c60001e40, conf=0x7f8c653ff930, bf=0x7f8c27225dad "", len=512) > at /home/almagui/src/dwarves/dwarves_fprintf.c:1183 > #4 0x00007f8c8261b52b in proto__get (func=0x7f8c60272f30, > proto=0x7f8c27225dad "", len=512) > at /home/almagui/src/dwarves/btf_encoder.c:811 > #5 0x00007f8c8261b665 in funcs__match (encoder=0x7f8c28023220, > func=0x7f8c27225d88, f2=0x7f8c5805c560) > at /home/almagui/src/dwarves/btf_encoder.c:839 > #6 0x00007f8c8261b7fc in btf_encoder__save_func (encoder=0x7f8c28023220, > fn=0x7f8c5805c560, func=0x7f8c27225d88) > at /home/almagui/src/dwarves/btf_encoder.c:871 > #7 0x00007f8c8261e361 in btf_encoder__encode_cu (encoder=0x7f8c28023220, > cu=0x7f8c58001e20, conf_load=0x412400 <conf_load>) > at /home/almagui/src/dwarves/btf_encoder.c:1888 > #8 0x000000000040a36c in pahole_stealer (cu=0x7f8c58001e20, > conf_load=0x412400 <conf_load>, thr_data=0x0) > at /home/almagui/src/dwarves/pahole.c:3342 > #9 0x00007f8c8262672c in cu__finalize (cu=0x7f8c38001e20, cus=0x21412a0, > conf=0x412400 <conf_load>, thr_data=0x0) > at /home/almagui/src/dwarves/dwarf_loader.c:3029 > #10 0x00007f8c82626765 in cus__finalize (cus=0x21412a0, cu=0x7f8c38001e20, > conf=0x412400 <conf_load>, thr_data=0x0) > at /home/almagui/src/dwarves/dwarf_loader.c:3036 > #11 0x00007f8c82626e9b in dwarf_cus__process_cu (dcus=0x7ffd71eaf0d0, > cu_die=0x7f8c653ffeb0, cu=0x7f8c38001e20, thr_data=0x0) > at /home/almagui/src/dwarves/dwarf_loader.c:3243 > #12 0x00007f8c826270d2 in dwarf_cus__process_cu_thread (arg=0x7ffd71eaef50) > at /home/almagui/src/dwarves/dwarf_loader.c:3313 > #13 0x00007f8c816081da in start_thread () from /usr/lib64/libpthread.so.0 > #14 0x00007f8c81239e73 in clone () from /usr/lib64/libc.so.6 > > So for conf_load->skip_encoding_btf_inconsistent_proto (enabled as part > of "all" and enabled for vmlinux/module BTF), we use dwarves_fprintf() > to write prototypes to check for inconsistent definitions. > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77) > at /home/almagui/src/dwarves/dwarves.c:612 > 612 return id >= pt->nr_entries ? NULL : pt->entries[id]; > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))] > (gdb) print *(struct ptr_table *)0x7f8c60001e70 > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096} > (gdb) > > So it looks like the ptr_table has 2979 entries but entries is NULL; > could there be an issue where CU initialization is not yet complete > for some threads (it also happens very early in processing)? Can you > reproduce this failure at your end? Thanks! > the following (when applied on top of the series) resolves the segmentation fault for me: diff --git a/pahole.c b/pahole.c index 6c7e738..5ff0eaf 100644 --- a/pahole.c +++ b/pahole.c @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct cu *cu, if (conf_load->reproducible_build) { ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it. - // Equivalent to LSK__DELETE since we processed this - cus__remove(cus, cu); - cu__delete(cu); } out_btf: if (!thr_data) // See comment about reproducibe_build above In other words, the problem is we remove/delete CUs when finished with them in each thread (when BTF is generated). However because the save/add_saved_funcs stashes CU references in the associated struct function * (to allow prototype comparison for the same function in different CUs), we end up with stale CU references and in this case the freed/nulled ptr_table caused an issue. As far as I can see we need to retain CUs until all BTF has been merged from threads. With the fix in place, I'm seeing less then 100msec difference between reproducible/non-reproducible vmlinux BTF generation; that's great! Alan > Alan > >> - Arnaldo >> >> Arnaldo Carvalho de Melo (12): >> core: Allow asking for a reproducible build >> pahole: Disable BTF multithreaded encoded when doing reproducible builds >> dwarf_loader: Separate creating the cu/dcu pair from processing it >> dwarf_loader: Introduce dwarf_cus__process_cu() >> dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() >> dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() >> core: Add unlocked cus__add() variant >> core: Add cus__remove(), counterpart of cus__add() >> dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE >> core/dwarf_loader: Add functions to set state of CU processing >> pahole: Encode BTF serially in a reproducible build >> tests: Add a BTF reproducible generation test >> >> dwarf_loader.c | 73 +++++++++++++++++++++++--------- >> dwarves.c | 58 ++++++++++++++++++++++++- >> dwarves.h | 17 ++++++++ >> pahole.c | 84 +++++++++++++++++++++++++++++++++++-- >> tests/reproducible_build.sh | 56 +++++++++++++++++++++++++ >> 5 files changed, 264 insertions(+), 24 deletions(-) >> create mode 100755 tests/reproducible_build.sh >> >