Creating an instance of btf for each worker thread allows steal-function provided by pahole to add type info on multiple threads without a lock. The main thread merges the results of worker threads to the primary instance. Copying data from per-thread btf instances to the primary instance is expensive now. However, there is a patch landed at the bpf-next repository. [1] With the patch for bpf-next and this patch, they drop total runtime to 5.4s from 6.0s with "-j4" on my device to generate BTF for Linux. V2 fixes typo and syntax according the comments got from v1. It also divides part 1 of v1 into part 1 & 2. [1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d81283d27266 [v1] https://lore.kernel.org/dwarves/20220120010817.2803482-1-kuifeng@xxxxxx/ Kui-Feng Lee (3): dwarf_loader: Receive per-thread data on worker threads. dwarf_loader: Prepare and pass per-thread data to worker threads. pahole: Use per-thread btf instances to avoid mutex locking. btf_encoder.c | 5 +++ btf_encoder.h | 2 + btf_loader.c | 2 +- ctf_loader.c | 2 +- dwarf_loader.c | 58 ++++++++++++++++++------ dwarves.h | 9 +++- pahole.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++--- pdwtags.c | 3 +- pfunct.c | 4 +- 9 files changed, 177 insertions(+), 25 deletions(-) -- 2.30.2