On 10/10/2024 01:36, Ihor Solodrai wrote: > On Wednesday, October 9th, 2024 at 4:43 PM, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > [...] >> >> Do you have the performance / memory usage stats for next vs this patch-set? > > Hi Eduard. > > Yes, I ran perf stat, and looked at max memory as reported by > `/usr/bin/time -v`. The difference is insignificant compared to > acmel/dwarves:next (a1241b0) [1]. See below. > > In terms of speed I didn't expect an improvement. It might have > even gotten worse due to potential encoder threads synchronization > when accessing elf_functions table. The table is now built once, but > before the changes it was built once *per thread*. > > As for memory, no difference is a little surprising as we now have one > table instead of N (where N is number of threads). But more stuff was > added to elf_function, so I guess it ate all potential gains. > > > Performance counter stats for './pahole -J -j8 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust ~/repo/bpf-dev-docker/linux/.tmp_vmlinux1' (31 runs): > > acmel/dwarves:next > > 68,904,383,016 cycles ( +- 0.30% ) > > 5.5862 +- 0.0304 seconds time elapsed ( +- 0.54% ) > > vs this patchset > > 68,235,717,886 cycles ( +- 0.30% ) > > 5.5550 +- 0.0412 seconds time elapsed ( +- 0.74% ) > > Memory on acmel/dwarves:next > Maximum resident set size (kbytes): 1392640 > Maximum resident set size (kbytes): 1394600 > Maximum resident set size (kbytes): 1393788 > > Memory on this patchset: > Maximum resident set size (kbytes): 1393564 > Maximum resident set size (kbytes): 1394840 > Maximum resident set size (kbytes): 1392348 > > [1] https://github.com/acmel/dwarves/commit/a1241b095de948becfed882929dda7c4318e022a > > Thanks for these stats! In general, I really like the direction; it also fits neatly with our future plans around encoding additional info like function addresses; the previous approach of storing all info via name matching wasn't ideal for that. I'll try it out and review the changes tomorrow. One thing I'm curious about; I presume the above stats are for single-threaded peak memory utilization, right? If that is correct, how do things scale as we add threads? I'd assume that since we're now sharing ELF function info, we should see a drop in peak memory utilization for nthreads > 1 (as compared to the baseline next code)? Another thing we're encouraging is running the tests; you can do this via vmlinux=/path/2/vmlinux ./tests/tests Not too many there today, but we're working on growing the set of tests. If you set VERBOSE=1 too you can get a lot of extra info around function encoding. One thing I think we should be careful about is to ensure we get a similar number of functions encoded with these changes as compared to baseline. I don't see any major reason why we wouldn't, but good to check regardless. Thanks! Alan