On 10/10/2024 21:31, Alan Maguire wrote: > On 10/10/2024 01:36, Ihor Solodrai wrote: >> On Wednesday, October 9th, 2024 at 4:43 PM, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: >> >> [...] >>> >>> Do you have the performance / memory usage stats for next vs this patch-set? >> >> Hi Eduard. >> >> Yes, I ran perf stat, and looked at max memory as reported by >> `/usr/bin/time -v`. The difference is insignificant compared to >> acmel/dwarves:next (a1241b0) [1]. See below. >> >> In terms of speed I didn't expect an improvement. It might have >> even gotten worse due to potential encoder threads synchronization >> when accessing elf_functions table. The table is now built once, but >> before the changes it was built once *per thread*. >> >> As for memory, no difference is a little surprising as we now have one >> table instead of N (where N is number of threads). But more stuff was >> added to elf_function, so I guess it ate all potential gains. >> >> >> Performance counter stats for './pahole -J -j8 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust ~/repo/bpf-dev-docker/linux/.tmp_vmlinux1' (31 runs): >> >> acmel/dwarves:next >> >> 68,904,383,016 cycles ( +- 0.30% ) >> >> 5.5862 +- 0.0304 seconds time elapsed ( +- 0.54% ) >> >> vs this patchset >> >> 68,235,717,886 cycles ( +- 0.30% ) >> >> 5.5550 +- 0.0412 seconds time elapsed ( +- 0.74% ) >> >> Memory on acmel/dwarves:next >> Maximum resident set size (kbytes): 1392640 >> Maximum resident set size (kbytes): 1394600 >> Maximum resident set size (kbytes): 1393788 >> >> Memory on this patchset: >> Maximum resident set size (kbytes): 1393564 >> Maximum resident set size (kbytes): 1394840 >> Maximum resident set size (kbytes): 1392348 >> >> [1] https://github.com/acmel/dwarves/commit/a1241b095de948becfed882929dda7c4318e022a >> >> > > Thanks for these stats! In general, I really like the direction; it also > fits neatly with our future plans around encoding additional info like > function addresses; the previous approach of storing all info via name > matching wasn't ideal for that. I'll try it out and review the changes > tomorrow. > > One thing I'm curious about; I presume the above stats are for > single-threaded peak memory utilization, right? If that is correct, how > do things scale as we add threads? I'd assume that since we're now > sharing ELF function info, we should see a drop in peak memory > utilization for nthreads > 1 (as compared to the baseline next code)? > Did some experiments here, saw no significant difference in terms of peak memory utilization; this may indicate the peak memory utilization is a function of something else. Comparing peak memory utilization between 1 and 8-threaded encoding, a similar pattern is observed for baseline next and this series, for baseline 1 vs 8 threads we see: < Maximum resident set size (kbytes): 1069304 --- > Maximum resident set size (kbytes): 1119412 ...while for this series 1 vs 8 threads we see < Maximum resident set size (kbytes): 1071148 --- > Maximum resident set size (kbytes): 1125052 So pretty similar really. Maybe a system with a larger number of processors would reveal something more here and show the benefits of shared ELF representations. > Another thing we're encouraging is running the tests; you can do this > via > > vmlinux=/path/2/vmlinux ./tests/tests > > Not too many there today, but we're working on growing the set of tests. > If you set VERBOSE=1 too you can get a lot of extra info around function > encoding. One thing I think we should be careful about is to ensure we > get a similar number of functions encoded with these changes as compared > to baseline. I don't see any major reason why we wouldn't, but good to > check regardless. Thanks! > I tried this too, and compared verbose output of baseline and test btf_functions; both were identical: $ VERBOSE=1 vmlinux=/home/almagui/kbuild/bpf-next/vmlinux bash btf_functions.sh > /var/tmp/btf_functions.baseline $ VERBOSE=1 vmlinux=/home/almagui/kbuild/bpf-next/vmlinux bash btf_functions.sh > /var/tmp/btf_functions.test $ diff /var/tmp/btf_functions.baseline /var/tmp/btf_functions.test $ This suggests the results are the same since we encode the same number of functions, refuse to encode the same number of inconsistent functions etc. Which is great! Alan