Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/04/2024 09:58, Alan Maguire wrote:
> On 02/04/2024 20:39, Arnaldo Carvalho de Melo wrote:
>> Hi,
>>
>> 	This allows us to have reproducible builds while keeping the
>> DWARF loading phase in parallel, achieving a noticeable speedup as
>> showed in the commit log messages:
>>
>> On a:
>>
>>   model name    : Intel(R) Core(TM) i7-14700K
>>
>>   8 performance cores (16 threads), 12 efficiency cores.
>>
>> Serial encoding:
>>
>>   $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
>>              5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )
>>
>> Parallel, but non-reproducible:
>>
>>   $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
>>               1.8529 +- 0.0159 seconds time elapsed  ( +-  0.86% )
>>
>> reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding:
>>
>>   $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
>>               2.3632 +- 0.0164 seconds time elapsed  ( +-  0.69% )
>>
>> Please take a look, its in the 'next' branch at:
>>
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next
>>
>> There is a new tool to do regression testing on this feature:
>>
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630
>>   
>> And here a more detailed set of tests using it:
>>
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7
>>
>> Working on libbpf to allow for parallel reproducible BTF encoding is the
>> next step.
>>
>> Thanks a lot,
>>
> 
> Hey Arnaldo
> 
> In testing this series I've hit a segmentation fault:
> 
> Using host libthread_db library "/usr/lib64/libthread_db.so.1".
> Core was generated by `pahole -J --btf_features=all --reproducible_build
> -j vmlinux'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:612
> 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> (gdb) bt
> #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:612
> #1  0x00007f8c8260ada2 in cu__type (cu=0x7f8c60001e40, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:806
> #2  0x00007f8c8261342c in ftype__fprintf (ftype=0x7f8c60272f30,
>     cu=0x7f8c60001e40, name=0x0, inlined=0, is_pointer=0, type_spacing=0,
>     is_prototype=true, conf=0x7f8c653ff930, fp=0x7f8c3804bc90)
>     at /home/almagui/src/dwarves/dwarves_fprintf.c:1388
> #3  0x00007f8c8261289d in function__prototype_conf (func=0x7f8c60272f30,
>     cu=0x7f8c60001e40, conf=0x7f8c653ff930, bf=0x7f8c27225dad "", len=512)
>     at /home/almagui/src/dwarves/dwarves_fprintf.c:1183
> #4  0x00007f8c8261b52b in proto__get (func=0x7f8c60272f30,
>     proto=0x7f8c27225dad "", len=512)
>     at /home/almagui/src/dwarves/btf_encoder.c:811
> #5  0x00007f8c8261b665 in funcs__match (encoder=0x7f8c28023220,
>     func=0x7f8c27225d88, f2=0x7f8c5805c560)
>     at /home/almagui/src/dwarves/btf_encoder.c:839
> #6  0x00007f8c8261b7fc in btf_encoder__save_func (encoder=0x7f8c28023220,
>     fn=0x7f8c5805c560, func=0x7f8c27225d88)
>     at /home/almagui/src/dwarves/btf_encoder.c:871
> #7  0x00007f8c8261e361 in btf_encoder__encode_cu (encoder=0x7f8c28023220,
>     cu=0x7f8c58001e20, conf_load=0x412400 <conf_load>)
>     at /home/almagui/src/dwarves/btf_encoder.c:1888
> #8  0x000000000040a36c in pahole_stealer (cu=0x7f8c58001e20,
>     conf_load=0x412400 <conf_load>, thr_data=0x0)
>     at /home/almagui/src/dwarves/pahole.c:3342
> #9  0x00007f8c8262672c in cu__finalize (cu=0x7f8c38001e20, cus=0x21412a0,
>     conf=0x412400 <conf_load>, thr_data=0x0)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3029
> #10 0x00007f8c82626765 in cus__finalize (cus=0x21412a0, cu=0x7f8c38001e20,
>     conf=0x412400 <conf_load>, thr_data=0x0)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3036
> #11 0x00007f8c82626e9b in dwarf_cus__process_cu (dcus=0x7ffd71eaf0d0,
>     cu_die=0x7f8c653ffeb0, cu=0x7f8c38001e20, thr_data=0x0)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3243
> #12 0x00007f8c826270d2 in dwarf_cus__process_cu_thread (arg=0x7ffd71eaef50)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3313
> #13 0x00007f8c816081da in start_thread () from /usr/lib64/libpthread.so.0
> #14 0x00007f8c81239e73 in clone () from /usr/lib64/libc.so.6
> 
> So for conf_load->skip_encoding_btf_inconsistent_proto (enabled as part
> of "all" and enabled for vmlinux/module BTF), we use dwarves_fprintf()
> to write prototypes to check for inconsistent definitions.
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:612
> 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> (gdb) print *(struct ptr_table *)0x7f8c60001e70
> $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096}
> (gdb)
> 
> So it looks like the ptr_table has 2979 entries but entries is NULL;
> could there be an issue where CU initialization is not yet complete
> for some threads (it also happens very early in processing)? Can you
> reproduce this failure at your end? Thanks!
>

the following (when applied on top of the series) resolves the
segmentation fault for me:

diff --git a/pahole.c b/pahole.c
index 6c7e738..5ff0eaf 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct
cu *cu,
                if (conf_load->reproducible_build) {
                        ret = LSK__KEEPIT; // we're not processing the
cu passed to this function, so keep it.
-                        // Equivalent to LSK__DELETE since we processed
this
-                       cus__remove(cus, cu);
-                       cu__delete(cu);
                }
 out_btf:
                if (!thr_data) // See comment about reproducibe_build above


In other words, the problem is we remove/delete CUs when finished with
them in each thread (when BTF is generated).  However because the
save/add_saved_funcs stashes CU references in the associated struct
function * (to allow prototype comparison for the same function in
different CUs), we end up with stale CU references and in this case the
freed/nulled ptr_table caused an issue. As far as I can see we need to
retain CUs until all BTF has been merged from threads.

With the fix in place, I'm seeing less then 100msec difference between
reproducible/non-reproducible vmlinux BTF generation; that's great!

Alan

> Alan
> 
>> - Arnaldo
>>
>> Arnaldo Carvalho de Melo (12):
>>   core: Allow asking for a reproducible build
>>   pahole: Disable BTF multithreaded encoded when doing reproducible builds
>>   dwarf_loader: Separate creating the cu/dcu pair from processing it
>>   dwarf_loader: Introduce dwarf_cus__process_cu()
>>   dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
>>   dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu()
>>   core: Add unlocked cus__add() variant
>>   core: Add cus__remove(), counterpart of cus__add()
>>   dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
>>   core/dwarf_loader: Add functions to set state of CU processing
>>   pahole: Encode BTF serially in a reproducible build
>>   tests: Add a BTF reproducible generation test
>>
>>  dwarf_loader.c              | 73 +++++++++++++++++++++++---------
>>  dwarves.c                   | 58 ++++++++++++++++++++++++-
>>  dwarves.h                   | 17 ++++++++
>>  pahole.c                    | 84 +++++++++++++++++++++++++++++++++++--
>>  tests/reproducible_build.sh | 56 +++++++++++++++++++++++++
>>  5 files changed, 264 insertions(+), 24 deletions(-)
>>  create mode 100755 tests/reproducible_build.sh
>>
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux