[PATCH dwarves v2 00/10] pahole: shared ELF and faster reproducible BTF encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a v2 of the patchset aiming to speed up parallel BTF encoding
when reproducible_build flag is set (see link [1]).

In comparison to v1:
  * patch #2 adding section-relative addresses to elf_functions is
    removed as unrelated [2]
  * patch #9 [3] is replaced with patches #8, #9 and #10 (the biggest
    and most important in this series)

Patch #10 rewrites multithreading implementation to job/worker
model. See the details in the commit message.

The ./tests/tests pass with a vmlinux build on bpf-next.

I also confrimed that the reproducible bpftool dump of BTF produced
for vmlinux is identical between this patch series and pahole/next.

With this patch series, the performance of parallel BTF encoding is
comparable to non-reproducible runs on pahole/next. Depending on the
number of threads and allowed memory usage (indirectly controlled by
max_decoded_cus parameter of the queue in the dwarf_loader.c), it may
be a little slower or a little faster.

Note that the number of CPU cycles is significantly less, although the
wall-clock time is somewhat greater for -j24, as reported by perf.

See sample measurements below (host nproc=24).

This patch (always reproducible)

    -j1 mem 842020 Kb, time 6.31 sec
    -j3 mem 864604 Kb, time 2.90 sec
    -j6 mem 927760 Kb, time 2.21 sec
    -j12 mem 1026616 Kb, time 2.29 sec
    -j24 mem 1188448 Kb, time 2.36 sec
    -j48 mem 1462656 Kb, time 2.48 sec

     Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

        46,771,092,586      cycles:u                                                                ( +-  0.17% )

               2.36785 +- 0.00503 seconds time elapsed  ( +-  0.21% )

pahole/next (1cb4202) non-reproducible

    -j1 mem 834004 Kb, time 6.25 sec
    -j3 mem 976480 Kb, time 3.21 sec
    -j6 mem 1081432 Kb, time 2.36 sec
    -j12 mem 1161252 Kb, time 2.07 sec
    -j24 mem 1303060 Kb, time 2.13 sec
    -j48 mem 1537800 Kb, time 2.39 sec

     Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

        60,436,382,442      cycles:u                                                                ( +-  0.22% )

                2.2024 +- 0.0151 seconds time elapsed  ( +-  0.68% )

pahole/next (1cb4202) reproducible

    -j1 mem 4745764 Kb, time 7.64 sec
    -j3 mem 4744556 Kb, time 3.95 sec
    -j6 mem 4744592 Kb, time 2.98 sec
    -j12 mem 4744680 Kb, time 2.99 sec
    -j24 mem 4745252 Kb, time 2.99 sec
    -j48 mem 4744520 Kb, time 2.98 sec

     Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

        38,155,725,721      cycles:u                                                                ( +-  0.29% )

               3.00290 +- 0.00501 seconds time elapsed  ( +-  0.17% )

[1] https://lore.kernel.org/dwarves/20241128012341.4081072-1-ihor.solodrai@xxxxx/
[2] https://lore.kernel.org/dwarves/20241128012341.4081072-3-ihor.solodrai@xxxxx/
[3] https://lore.kernel.org/dwarves/20241128012341.4081072-10-ihor.solodrai@xxxxx/

Alan Maguire (2):
  btf_encoder: simplify function encoding
  btf_encoder: separate elf function, saved function representations

Ihor Solodrai (8):
  dwarf_loader: introduce pre_load_module hook to conf_load
  btf_encoder: introduce elf_functions struct type
  btf_encoder: collect elf_functions in btf_encoder__pre_load_module
  btf_encoder: switch to shared elf_functions table
  btf_encoder: introduce btf_encoding_context
  btf_encoder: remove skip_encoding_inconsistent_proto
  dwarf_loader: introduce cu->id
  dwarf_loader: multithreading with a job/worker model

 btf_encoder.c               | 639 +++++++++++++++++++++---------------
 btf_encoder.h               |   8 +-
 btf_loader.c                |   2 +-
 ctf_loader.c                |   2 +-
 dwarf_loader.c              | 352 ++++++++++++++------
 dwarves.c                   |  44 ---
 dwarves.h                   |  21 +-
 pahole.c                    | 237 +++----------
 pdwtags.c                   |   3 +-
 pfunct.c                    |   3 +-
 tests/reproducible_build.sh |   5 +-
 11 files changed, 685 insertions(+), 631 deletions(-)

-- 
2.47.1







[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux