Re: [PATCH dwarves 3/3] dwarf_loader: add option to merge more dwarf cu's into one pahole cu

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 26 Mar 2021 16:21:23 -0700

On Wed, Mar 24, 2021 at 11:53 PM Yonghong Song <yhs@xxxxxx> wrote:
>
> This patch added an option "merge_cus", which will permit
> to merge all debug info cu's into one pahole cu.
> For vmlinux built with clang thin-lto or lto, there exist
> cross cu type references. For example, you could have
>   compile unit 1:
>      tag 10:  type A
>   compile unit 2:
>      ...
>        refer to type A (tag 10 in compile unit 1)
> I only checked a few but have seen type A may be a simple type
> like "unsigned char" or a complex type like an array of base types.
>
> There are two different ways to resolve this issue:
> (1). merge all compile units as one pahole cu so tags/types
>      can be resolved easily, or
> (2). try to do on-demand type traversal in other debuginfo cu's
>      when we do die_process().
> The method (2) is much more complicated so I picked method (1).
> An option "merge_cus" is added to permit such an operation.
>
> Merging cu's will create a single cu with lots of types, tags
> and functions. For example with clang thin-lto built vmlinux,
> I saw 9M entries in types table, 5.2M in tags table. The
> below are pahole wallclock time for different hashbits:
> command line: time pahole -J --merge_cus vmlinux
>       # of hashbits            wallclock time in seconds
>           15                       460
>           16                       255
>           17                       131
>           18                       97
>           19                       75
>           20                       69
>           21                       64
>           22                       62
>           23                       58
>           24                       64

What were the numbers for different hashbits without --merge_cus?

>
> Note that the number of hashbits 24 makes performance worse
> than 23. The reason could be that 23 hashbits can cover 8M
> buckets (close to 9M for the number of entries in types table).
> Higher number of hash bits allocates more memory and becomes
> less cache efficient compared to 23 hashbits.
>
> This patch picks # of hashbits 21 as the starting value
> and will try to allocate memory based on that, if memory
> allocation fails, we will go with less hashbits until
> we reach hashbits 15 which is the default for
> non merge-cu case.
>
> Signed-off-by: Yonghong Song <yhs@xxxxxx>
> ---
>  dwarf_loader.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  dwarves.h      |  2 ++
>  pahole.c       |  8 +++++
>  3 files changed, 100 insertions(+)
>

[...]