Creating an instance of btf for each worker thread allows steal-function provided by pahole to add type info on multiple threads without a lock. The main thread merges the results of worker threads to the primary instance. Copying data from per-thread btf instances to the primary instance is expensive now. However, there is a patch landed at the bpf-next repository. [1] With the patch for bpf-next and this patch, they drop total runtime to 5.4s from 6.0s with "-j4" on my device to generate BTF for Linux. V4 includes following changes. - Fix nits and typos. - Rollback to calling btf__add_btf() at the main thread to simplify code. The reasons for that are additional lock making it complicated and doing w/o reusing btf_encoder being obvious slower. [1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d81283d27266 [v1] https://lore.kernel.org/dwarves/20220120010817.2803482-1-kuifeng@xxxxxx/ [v2] https://lore.kernel.org/dwarves/20220124191858.1601255-1-kuifeng@xxxxxx/ [v3] https://lore.kernel.org/dwarves/20220126040509.1862767-1-kuifeng@xxxxxx/ Kui-Feng Lee (4): dwarf_loader: Receive per-thread data on worker threads. dwarf_loader: Prepare and pass per-thread data to worker threads. pahole: Use per-thread btf instances to avoid mutex locking. libbpf: Update libbpf to a new revision. btf_encoder.c | 25 ++++++---- btf_encoder.h | 2 + btf_loader.c | 4 +- ctf_loader.c | 2 +- dwarf_loader.c | 59 ++++++++++++++++++----- dwarves.h | 9 +++- lib/bpf | 2 +- pahole.c | 128 +++++++++++++++++++++++++++++++++++++++++++++---- pdwtags.c | 3 +- pfunct.c | 4 +- 10 files changed, 198 insertions(+), 40 deletions(-) -- 2.30.2