On 04/10/2024 06:13, Ihor Solodrai wrote: > This patchset changes how BTF encoders collect and store function > information in a table searcheable by name, where their > btf_encoder_function_state is maintained. > hi Ihor I'll take a more detailed look shortly, but I'd suggest respinning the series against the updated "next" branch shortly as there are changes in flight that Stephen is working on to encode variables; they significantly change the processing of variables, see [1] for details. They have partly landed and the rest will likely land soon. It might make things easier to wait until those changes land and respin, and I don't think they invalidate the approach you suggest, just will require some merging around variable handling I think. [1] https://lore.kernel.org/dwarves/8734ldm72u.fsf@xxxxxxxxxx/ > Previously a separate array of elf_function structs was built and > stored by each btf_encoder, and when the encoders were merged before > dumping BTF their function tables would be traversed and merged too. > > With these changes, the table used for searching for elf_function > becomes shared between encoder threads. However, additional > information in the table and in each individual elf_function has to be > stored to accomodate the sharing. > > The memory footprint is now slightly higher, although it degrades > faster when number of threads is increased without this patchset. > > /usr/bin/time -v with -j8: > > With patchset: Maximum resident set size (kbytes): 1402792 (~2% higher) > Without: Maximum resident set size (kbytes): 1376700 > > /usr/bin/time -v with -j16: > > With patchset: Maximum resident set size (kbytes): 1456044 > Without: Maximum resident set size (kbytes): 1450748 > > The idea to make function table shared between encoders has been > mentioned previously [1]. > > I've been looking into ways of improving parallel reproducible_build > time, ideally making it as fast as non-reproducible. In an off-list > discussion Eduard Zingerman suggested creating a btf_encoder for each > input compilation unit. This would allow to enforce the order of BTF > merge after btf_encoders are done. > > With each encoder maintaining it's own table this idea proved > infeasible [2]. And so making it shared became a requirement. > > I implemented a prototype that improved the speed of parallel > reproducible_build by using a similar shared table and implementing > Eduard's idea [3]. On master (84ab453) reproducible_build was ~50% > slower than non-reproducible, and with my experimental changes it was > 20% slower. > > This patchset stems from that work and discussions. The longer term > goal is to make BTF encoding reproducible by default, and fast. > > [1] https://lore.kernel.org/dwarves/20240909155031.1596249-1-alan.maguire@xxxxxxxxxx/ > [2] https://lore.kernel.org/dwarves/aQtEzThpIhRiwQpkgeZepPGr5Mt_zuGfPbxQxZgS6SSoPYoqM1afjDqEmIZkdH3YzvhWmWwqCS_8ZvFTcSZHvpkAeBpLRTAZEmrOhq0svfo=@pm.me/ > [3] https://github.com/theihor/dwarves/pull/1/files > > Ihor Solodrai (6): > dwarf_loader: introduce pre/post cus__load_module hooks to conf_load > btf_encoder: split btf_encoder__collect_symbols() > btf_encoder: introduce elf_functions struct type > btf_encoder: collect elf_functions in pre_cus__load_module > btf_encoder: introduce elf_functions_entry > btf_encoder: switch to shared elf_functions table > > btf_encoder.c | 341 ++++++++++++++++++++++++++++++++++++------------- > btf_encoder.h | 5 +- > dwarf_loader.c | 21 ++- > dwarves.h | 18 ++- > pahole.c | 28 ++-- > 5 files changed, 303 insertions(+), 110 deletions(-) >