Creating an instance of btf for each worker thread allows steal-function provided by pahole to add type info on multiple threads without a lock. The main thread merges the results of worker threads to the primary instance. Copying data from per-thread btf instances to the primary instance is expensive now. However, there is a patch landed at the bpf-next repository. [1] With the patch for bpf-next and this patch, they drop total runtime to 5.4s from 6.0s with "-j4" on my device to generate BTF for Linux. [1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d81283d27266 Kui-Feng Lee (2): dwarf_loader: Prepare and pass per-thread data to worker threads. pahole: Use per-thread btf instances to avoid mutex locking. btf_encoder.c | 5 +++ btf_encoder.h | 2 + btf_loader.c | 2 +- ctf_loader.c | 2 +- dwarf_loader.c | 58 ++++++++++++++++++------ dwarves.h | 9 +++- pahole.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++--- pdwtags.c | 3 +- pfunct.c | 4 +- 9 files changed, 180 insertions(+), 25 deletions(-) -- 2.30.2