Re: [PATCH v3 dwarves 0/8] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Em Wed, Feb 08, 2023 at 01:20:39PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Feb 07, 2023 at 05:14:54PM +0000, Alan Maguire escreveu:
> > At optimization level -O2 or higher in gcc, static functions may be
> > optimized such that they have suffixes like .isra.0, .constprop.0 etc.
> > These represent 
> >     
> > - constant propagation (.constprop.0);
> > - interprocedural scalar replacement of aggregates, removal of
> >   unused parameters and replacement of parameters passed by
> >   reference by parameters passed by value (.isra.0)
> 
> Initial test, without using the new options:
> 
> [acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | tail
>       3 start_show
>       3 timeout_show
>       3 uuid_show
>       4 m_next
>       4 parse_options
>       4 sk_diag_fill
>       4 state_show
>       4 state_store
>       5 status_show
>       6 type_show
> [acme@pumpkin ~]$
> 
> Now I'll use --skip_encoding_btf_inconsistent_proto and --btf_gen_optimized

With:

⬢[acme@toolbox linux]$ git diff
diff --git a/scripts/pahole-flags.sh b/scripts/pahole-flags.sh
index 1f1f1d397c399afe..9dd185fb1ff1fc3b 100755
--- a/scripts/pahole-flags.sh
+++ b/scripts/pahole-flags.sh
@@ -21,7 +21,7 @@ if [ "${pahole_ver}" -ge "122" ]; then
 fi
 if [ "${pahole_ver}" -ge "124" ]; then
        # see PAHOLE_HAS_LANG_EXCLUDE
-       extra_paholeopt="${extra_paholeopt} --lang_exclude=rust"
+       extra_paholeopt="${extra_paholeopt} --lang_exclude=rust --btf_gen_optimized --skip_encoding_btf_inconsistent_proto"
 fi

 echo ${extra_paholeopt}
⬢[acme@toolbox linux]$

I get:

[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | tail
      1 zswap_writeback_entry
      1 zswap_zpool_param_set
      1 zs_zpool_create
      1 zs_zpool_destroy
      1 zs_zpool_free
      1 zs_zpool_malloc
      1 zs_zpool_map
      1 zs_zpool_shrink
      1 zs_zpool_total_size
      1 zs_zpool_unmap
[acme@pumpkin ~]$

No functions with more than one entry:

[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep -v ' 1 '
[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | sort | uniq -c | sort -n | grep ' 1 ' | wc -l
54558
[acme@pumpkin ~]$ pfunct /sys/kernel/btf/vmlinux  | wc -l
54558
[acme@pumpkin ~]$

So I'll bump the release as we did in the past when testing features
that we need to test against a release on the pahole-flags.sh script so
that we can do further tests.

- Arnaldo
 
> - Arnaldo
>    
> > See [1] for details. 
> >     
> > Currently BTF encoding does not handle such optimized functions
> > that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
> > This is safer because such suffixes can often indicate parameters have
> > been optimized out.  This series addresses this by matching a
> > function to a suffixed version ("foo" matching "foo.isra.0") while
> > ensuring that the function signature does not contain optimized-out
> > parameters.  Note that if the function is found ("foo") it will
> > be preferred, only falling back to "foo.isra.0" if lookup of the
> > function fails.  Addition to BTF is skipped if the function has
> > optimized-out parameters, since the expected function signature
> > will not match. BTF encoding does not include the "."-suffix to
> > be consistent with DWARF. In addition, the kernel currently does
> > not allow a "." suffix in a BTF function name.
> > 
> > A problem with this approach however is that BTF carries out the
> > encoding process in parallel across multiple CUs, and sometimes
> > a function has optimized-out parameters in one CU but not others;
> > we see this for NF_HOOK.constprop.0 for example.  So in order to
> > determine if the function has optimized-out parameters in any
> > CU, its addition is not carried out until we have processed all
> > CUs and are about to merge BTF.  At this point we know if any
> > such optimizations have occurred.  Patches 1-5 handle the
> > optimized-out parameter identification and matching "."-suffixed
> > functions with the original function to facilitate BTF
> > encoding.  This feature can be enabled via the
> > "--btf_gen_optimized" option.
> > 
> > Patch 6 addresses a related problem - it is entirely possible
> > for a static function of the same name to exist in different
> > CUs with different function signatures.  Because BTF does not
> > currently encode any information that would help disambiguate
> > which BTF function specification matches which static function
> > (in the case of multiple different function signatures), it is
> > best to eliminate such functions from BTF for now.  The same
> > mechanism that is used to compare static "."-suffixed functions
> > is re-used for the static function comparison.  A superficial
> > comparison of number of parameters/parameter names is done to
> > see if such representations are consistent, and if inconsistent
> > prototypes are observed, the function is flagged for exclusion
> > from BTF.
> > 
> > When these methods are combined - the additive encoding of
> > "."-suffixed functions and the subtractive elimination of
> > functions with inconsistent parameters - we see an overall
> > drop in the number of functions in vmlinux BTF, from
> > 51529 to 50246.  Skipping inconsistent functions is enabled
> > via "--skip_encoding_btf_inconsistent_proto".
> > 
> > Changes since v2 [2]
> > - Arnaldo incorporated some of the suggestions in the v2 thread;
> >   these patches are based on those; the relevant changes are
> >   noted as committer changes.
> > - Patch 1 is unchanged from v2, but the rest of the patches
> >   have been updated:
> > - Patch 2 separates out the changes to the struct btf_encoder
> >   that better support later addition of functions.
> > - Patch 3 then is changed insofar as these changes are no
> >   longer needed for the function addition refactoring.
> > - Patch 4 has a small change; we need to verify that an
> >   encoder has actually been added to the encoders list
> >   prior to removal
> > - Patch 5 changed significantly; when attempting to measure
> >   performance the relatively good numbers attained when using
> >   delayed function addition were not reproducible.
> >   Further analysis revealed that the large number of lookups
> >   caused by the presence of the separate function tree was
> >   a major cause of performance degradation in the multi
> >   threaded case.  So instead of maintaining a separate tree,
> >   we use the ELF function list which we already need to look
> >   up to match ELF -> DWARF function descriptions to store
> >   the function representation.  This has 2 benefits; firstly
> >   as mentioned, we already look up the ELF function so no
> >   additional lookup is required to save the function.
> >   Secondly, the ELF representation is identical for each
> >   encoder, so we can index the same function across multiple
> >   encoder function arrays - this greatly speeds up the
> >   processing of comparing function representations across
> >   encoders.  There is still a performance cost in this
> >   approach however; more details are provided in patch 6.
> >   An option specific to adding functions with "." suffixes
> >   is added "--btf_gen_optimized"
> > - Patch 6 builds on patch 5 in applying the save/merge/add
> >   approach for all functions using the same mechanisms.
> >   In addition the "--skip_encoding_btf_inconsistent_proto"
> >   option is introduced.
> > - Patches 7/8 document the new options in the pahole manual
> >   page.
> >   
> > Changes since v1 [3]
> > 
> > - Eduard noted that a DW_AT_const_value attribute can signal
> >   an optimized-out parameter, and that the lack of a location
> >   attribute signals optimization; ensure we handle those cases
> >   also (Eduard, patch 1).
> > - Jiri noted we can have inconsistencies between a static
> >   and non-static function; apply the comparison process to
> >   all functions (Jiri, patch 5)
> > - segmentation fault was observed when handling functions with
> >   > 10 parameters; needed parameter comparison loop to exit
> >   at BTF_ENCODER_MAX_PARAMETERS (patch 5)
> > - Kui-Feng Lee pointed out that having a global shared function
> >   tree would lead to a lot of contention; here a per-encoder 
> >   tree is used, and once the threads are collected the trees
> >   are merged. Performance numbers are provided in patch 5 
> >   (Kui-Feng Lee, patches 4/5)
> > 
> > [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> > [2] https://lore.kernel.org/bpf/1675088985-20300-1-git-send-email-alan.maguire@xxxxxxxxxx/
> > [3] https://lore.kernel.org/bpf/1674567931-26458-1-git-send-email-alan.maguire@xxxxxxxxxx/
> > 
> > Alan Maguire (8):
> >   dwarf_loader: Help spotting functions with optimized-out parameters
> >   btf_encoder: store type_id_off, unspecified type in encoder
> >   btf_encoder: Refactor function addition into dedicated
> >     btf_encoder__add_func
> >   btf_encoder: Rework btf_encoders__*() API to allow traversal of
> >     encoders
> >   btf_encoder: Represent "."-suffixed functions (".isra.0") in BTF
> >   btf_encoder: support delaying function addition to check for function
> >     prototype inconsistencies
> >   dwarves: document --btf_gen_optimized option
> >   dwarves: document --skip_encoding_btf_inconsistent_proto option
> > 
> >  btf_encoder.c      | 360 +++++++++++++++++++++++++++++++++++++--------
> >  btf_encoder.h      |   6 -
> >  dwarf_loader.c     | 130 +++++++++++++++-
> >  dwarves.h          |  11 +-
> >  man-pages/pahole.1 |  10 ++
> >  pahole.c           |  30 +++-
> >  6 files changed, 468 insertions(+), 79 deletions(-)
> > 
> > -- 
> > 2.31.1
> > 
> 
> -- 
> 
> - Arnaldo

-- 

- Arnaldo



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux