Re: [PATCH dwarves 0/2] Parallelize BTF type info generating of pahole

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On January 20, 2022 4:59:08 PM GMT-03:00, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>+cc bpf@xxxxxxxxxxxxxxx
>
>On Wed, Jan 19, 2022 at 5:08 PM Kui-Feng Lee <kuifeng@xxxxxx> wrote:
>>
>> Creating an instance of btf for each worker thread allows
>> steal-function provided by pahole to add type info on multiple threads
>> without a lock.  The main thread merges the results of worker threads
>> to the primary instance.
>>
>> Copying data from per-thread btf instances to the primary instance is
>> expensive now.  However, there is a patch landed at the bpf-next
>> repository. [1] With the patch for bpf-next and this patch, they drop
>> total runtime to 5.4s from 6.0s with "-j4" on my device to generate
>> BTF for Linux.
>
>Just a few more data points. I've tried this locally with 40 cores,
>both with and without the libbpf's btf__add_btf() optimization.
>
>BASELINE NON-PARALLEL
>=====================
>$ time ./pahole -J ~/linux-build/default/vmlinux
>./pahole -J ~/linux-build/default/vmlinux  11.17s user 0.66s system
>99% cpu 11.832 total
>
>BASELINE PARALLEL
>=================
>$ time ./pahole -j40 -J ~/linux-build/default/vmlinux
>./pahole -j40 -J ~/linux-build/default/vmlinux  13.85s user 0.75s
>system 290% cpu 5.023 total
>
>THESE PATCHES WITHOUT LIBBPF SPEED-UP
>=====================================
>$ time ./pahole -j40 -J ~/linux-build/default/vmlinux
>./pahole -j40 -J ~/linux-build/default/vmlinux  25.94s user 1.15s
>system 685% cpu 3.954 total
>
>THESE PATCHES WITH LATEST LIBBPF SPEED-UP
>=========================================
>$ time ./pahole -j40 -J ~/linux-build/default/vmlinux
>./pahole -j40 -J ~/linux-build/default/vmlinux  27.49s user 1.08s
>system 858% cpu 3.328 total
>
>
>So on 40 cores, it's a speed up from 11.8 seconds non-parallel, to 5s
>parallel without Kui-Feng's changes, to 4s with Kui-Feng's changes, to
>3.3s after libbpf update (I did it locally, will sync this to Github
>today).
>
>4x speed up, not bad!

That's indeed excellent! From the limited review I've made so far it looks good, takes advantage of the leg work I did, I'll just break down the patches a bit more, look at the review comments from you and repost, I'll not change the patches, just split them a bit more, tomorrow morning.

- Arnaldo
>
>But parallel mode is not currently enabled in kernel build, let's
>enable parallel mode and save those seconds during the kernel build!



>
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d81283d27266
>>
>> Kui-Feng Lee (2):
>>   dwarf_loader: Prepare and pass per-thread data to worker threads.
>>   pahole: Use per-thread btf instances to avoid mutex locking.
>>
>>  btf_encoder.c  |   5 +++
>>  btf_encoder.h  |   2 +
>>  btf_loader.c   |   2 +-
>>  ctf_loader.c   |   2 +-
>>  dwarf_loader.c |  58 ++++++++++++++++++------
>>  dwarves.h      |   9 +++-
>>  pahole.c       | 120 ++++++++++++++++++++++++++++++++++++++++++++++---
>>  pdwtags.c      |   3 +-
>>  pfunct.c       |   4 +-
>>  9 files changed, 180 insertions(+), 25 deletions(-)
>>
>> --
>> 2.30.2
>>




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux