Re: [PATCH dwarves 0/2] Parallelize BTF type info generating of pahole

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+cc bpf@xxxxxxxxxxxxxxx

On Wed, Jan 19, 2022 at 5:08 PM Kui-Feng Lee <kuifeng@xxxxxx> wrote:
>
> Creating an instance of btf for each worker thread allows
> steal-function provided by pahole to add type info on multiple threads
> without a lock.  The main thread merges the results of worker threads
> to the primary instance.
>
> Copying data from per-thread btf instances to the primary instance is
> expensive now.  However, there is a patch landed at the bpf-next
> repository. [1] With the patch for bpf-next and this patch, they drop
> total runtime to 5.4s from 6.0s with "-j4" on my device to generate
> BTF for Linux.

Just a few more data points. I've tried this locally with 40 cores,
both with and without the libbpf's btf__add_btf() optimization.

BASELINE NON-PARALLEL
=====================
$ time ./pahole -J ~/linux-build/default/vmlinux
./pahole -J ~/linux-build/default/vmlinux  11.17s user 0.66s system
99% cpu 11.832 total

BASELINE PARALLEL
=================
$ time ./pahole -j40 -J ~/linux-build/default/vmlinux
./pahole -j40 -J ~/linux-build/default/vmlinux  13.85s user 0.75s
system 290% cpu 5.023 total

THESE PATCHES WITHOUT LIBBPF SPEED-UP
=====================================
$ time ./pahole -j40 -J ~/linux-build/default/vmlinux
./pahole -j40 -J ~/linux-build/default/vmlinux  25.94s user 1.15s
system 685% cpu 3.954 total

THESE PATCHES WITH LATEST LIBBPF SPEED-UP
=========================================
$ time ./pahole -j40 -J ~/linux-build/default/vmlinux
./pahole -j40 -J ~/linux-build/default/vmlinux  27.49s user 1.08s
system 858% cpu 3.328 total


So on 40 cores, it's a speed up from 11.8 seconds non-parallel, to 5s
parallel without Kui-Feng's changes, to 4s with Kui-Feng's changes, to
3.3s after libbpf update (I did it locally, will sync this to Github
today).

4x speed up, not bad!

But parallel mode is not currently enabled in kernel build, let's
enable parallel mode and save those seconds during the kernel build!

>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d81283d27266
>
> Kui-Feng Lee (2):
>   dwarf_loader: Prepare and pass per-thread data to worker threads.
>   pahole: Use per-thread btf instances to avoid mutex locking.
>
>  btf_encoder.c  |   5 +++
>  btf_encoder.h  |   2 +
>  btf_loader.c   |   2 +-
>  ctf_loader.c   |   2 +-
>  dwarf_loader.c |  58 ++++++++++++++++++------
>  dwarves.h      |   9 +++-
>  pahole.c       | 120 ++++++++++++++++++++++++++++++++++++++++++++++---
>  pdwtags.c      |   3 +-
>  pfunct.c       |   4 +-
>  9 files changed, 180 insertions(+), 25 deletions(-)
>
> --
> 2.30.2
>



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux