Re: [PATCH dwarves 3/3] dwarf_loader: add option to merge more dwarf cu's into one pahole cu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/26/21 4:21 PM, Andrii Nakryiko wrote:
On Wed, Mar 24, 2021 at 11:53 PM Yonghong Song <yhs@xxxxxx> wrote:

This patch added an option "merge_cus", which will permit
to merge all debug info cu's into one pahole cu.
For vmlinux built with clang thin-lto or lto, there exist
cross cu type references. For example, you could have
   compile unit 1:
      tag 10:  type A
   compile unit 2:
      ...
        refer to type A (tag 10 in compile unit 1)
I only checked a few but have seen type A may be a simple type
like "unsigned char" or a complex type like an array of base types.

There are two different ways to resolve this issue:
(1). merge all compile units as one pahole cu so tags/types
      can be resolved easily, or
(2). try to do on-demand type traversal in other debuginfo cu's
      when we do die_process().
The method (2) is much more complicated so I picked method (1).
An option "merge_cus" is added to permit such an operation.

Merging cu's will create a single cu with lots of types, tags
and functions. For example with clang thin-lto built vmlinux,
I saw 9M entries in types table, 5.2M in tags table. The
below are pahole wallclock time for different hashbits:
command line: time pahole -J --merge_cus vmlinux
       # of hashbits            wallclock time in seconds
           15                       460
           16                       255
           17                       131
           18                       97
           19                       75
           20                       69
           21                       64
           22                       62
           23                       58
           24                       64

What were the numbers for different hashbits without --merge_cus?

Without --merge_cus means non-lto vmlinux.
Just did quick measurement, for hashbits 10 - 18,
all ranges from 37s - 39s for "pahole -J vmlinux" run
with 10 - 15 between 37 - 38 and the rest 38 - 39.

The number of cus for my particular vmlinux is 2915.
The total number of types among all cus is roughly 8M based
on a rough regex matching, so each cu roughly 2K.

So the current default setting is okay for
non-lto vmlinux.



Note that the number of hashbits 24 makes performance worse
than 23. The reason could be that 23 hashbits can cover 8M
buckets (close to 9M for the number of entries in types table).
Higher number of hash bits allocates more memory and becomes
less cache efficient compared to 23 hashbits.

This patch picks # of hashbits 21 as the starting value
and will try to allocate memory based on that, if memory
allocation fails, we will go with less hashbits until
we reach hashbits 15 which is the default for
non merge-cu case.

Signed-off-by: Yonghong Song <yhs@xxxxxx>
---
  dwarf_loader.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++
  dwarves.h      |  2 ++
  pahole.c       |  8 +++++
  3 files changed, 100 insertions(+)


[...]




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux