On Monday, October 21st, 2024 at 10:51 AM, Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > hi Ihor > > On 16/10/2024 01:10, Ihor Solodrai wrote: > > > btf_encoder__save_func() accumulates observations of DWARF functions, > > maintaining an elf_function per function name. > > > I've been struggling with the latter few patches in this series, and I > think that's in part due to the fact that you have to deal with some > (what I think are) unnecessary complications in the existing code. > > It should be easier to share ELF representations, but the situation > you've inherited is we mix immutable (ELF representation) and mutable > (function state information saved) representations, leading to > complications that require synchronization across threads. > > Stepping back, I think with a few simplifications, we can lay a better > foundation for your work, and BTF encoding in general. Specifically if we > > - always save and later add functions; currently we only save if we want > to avoid inconsistencies, but this means having to maintain two > codepaths for function addition which is messy. It is simpler to > always save and later see if we want to add functions. > - fully separate ELF representation (immutable) from saved function > represention (mutable). this will enable ELF sharing, and saved > functions state can simply point back at the appropriate ELF function > - For each encoder, we just save all function state representations in a > simple list; no merging is done at this point > - when adding saved functions, we combine lists from all encoders, merge > findings on inconsistent representations where required, and add all > functions without inconsistent representations. > > This will mean no concurrency issues, and we end up with a simpler > representation which I think should make ELF sharing much easier. I've > got a rough prototype of 3 prerequisite patches doing the above at > https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:elf-prep > > The changes are contained in the last 3 patches there if you want to > take a look. Hi Alan. Finally got time to try your changes. Apologies for delay, was busy with other things. TL;DR Here is my patchset rebased on top of your commits: * https://github.com/theihor/dwarves/pull/7 Please take a look, and let's sync on how do we plan to merge it in. Your commits seem to have debug code in them, so maybe you'd want to submit a clean version first. I must say earlier Eduard pointed out (off-list) the problems that you've described, but at the time I thought it would be too many changes if I tried to address them. It is indeed easier to move to a shared ELF functions table if all function states are collected before merging them, mostly because the need for thread synchronization disappears. I was able to effectively delete patch 4/5 of the series. One thing that bothered me is that now btf_encoder__add_saved_funcs() does more work (compared to v3 of the patchset). This is of course due to the fact that it is required to collect all function states from all encoders and group them by name, while it was done "automatically" when states were stored in elf_functions table. It's not good, because this step is single-threaded. However, I did some superficial measurements, and it appears btf_encoder__add_saved_funcs() step takes ~2% of the time when encoding vmlinux. So it's probably not worth complicating. Regarding "global variables" fork of our discussion [1], I didn't get any feedback from you or Arnaldo in response to suggested diff. I included a more thought out version in the branch [2], please take a look. See below some perf stats: marginal speedup and slower RSS growth with increasing number of threads. WIP v4 branch [2]: Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j23 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs): 83,054,827,477 cycles:u ( +- 0.29% ) 3.9829 +- 0.0374 seconds time elapsed ( +- 0.94% ) -j2: Maximum resident set size (kbytes): 783296 -j4: Maximum resident set size (kbytes): 866976 -j8: Maximum resident set size (kbytes): 992740 -j16: Maximum resident set size (kbytes): 1038788 -j32: Maximum resident set size (kbytes): 1169284 -j64: Maximum resident set size (kbytes): 1347232 dwarves/next [3] Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j23 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs): 87,748,403,977 cycles:u ( +- 0.23% ) 4.0570 +- 0.0240 seconds time elapsed ( +- 0.59% ) checking max rss -j2: Maximum resident set size (kbytes): 787256 -j4: Maximum resident set size (kbytes): 884360 -j8: Maximum resident set size (kbytes): 1018996 -j16: Maximum resident set size (kbytes): 1083528 -j32: Maximum resident set size (kbytes): 1279880 -j64: Maximum resident set size (kbytes): 1634656 [1]: https://lore.kernel.org/dwarves/4G5AFfVer_N_eJCZYc22pQM9rXbHOV2CZ4uOmqh4gFd1K2mgnbIDIZUpynMNCdJ-CEyvsBr0-cPdUzgNnM05NPkPjRdqdAnCAp8DrvUc-Iw=@pm.me/ [2]: https://github.com/theihor/dwarves/pull/7 [3]: https://github.com/theihor/dwarves/commit/729fd9963df576a04f2ba371b033c5300ebf0a91 > [...]