Re: [PATCH v2 dwarves 0/5] btf_encoder: implement shared elf_functions table

Alan Maguire <alan.maguire@xxxxxxxxxx> · Fri, 11 Oct 2024 17:27:35 +0100

On 10/10/2024 21:31, Alan Maguire wrote:
> On 10/10/2024 01:36, Ihor Solodrai wrote:
>> On Wednesday, October 9th, 2024 at 4:43 PM, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>>
>> [...]
>>>
>>> Do you have the performance / memory usage stats for next vs this patch-set?
>>
>> Hi Eduard.
>>
>> Yes, I ran perf stat, and looked at max memory as reported by
>> `/usr/bin/time -v`. The difference is insignificant compared to
>> acmel/dwarves:next (a1241b0) [1]. See below.
>>
>> In terms of speed I didn't expect an improvement. It might have 
>> even gotten worse due to potential encoder threads synchronization
>> when accessing elf_functions table. The table is now built once, but
>> before the changes it was built once *per thread*.
>>
>> As for memory, no difference is a little surprising as we now have one
>> table instead of N (where N is number of threads). But more stuff was
>> added to elf_function, so I guess it ate all potential gains.
>>
>>
>> Performance counter stats for './pahole -J -j8 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust ~/repo/bpf-dev-docker/linux/.tmp_vmlinux1' (31 runs):
>>
>> acmel/dwarves:next
>>
>>     68,904,383,016      cycles                                                                  ( +-  0.30% )
>>
>>             5.5862 +- 0.0304 seconds time elapsed  ( +-  0.54% )
>>
>> vs this patchset
>>
>>     68,235,717,886      cycles                                                                  ( +-  0.30% )
>>
>>             5.5550 +- 0.0412 seconds time elapsed  ( +-  0.74% )
>>
>> Memory on acmel/dwarves:next
>>        Maximum resident set size (kbytes): 1392640
>>        Maximum resident set size (kbytes): 1394600
>>        Maximum resident set size (kbytes): 1393788
>>
>> Memory on this patchset:
>>        Maximum resident set size (kbytes): 1393564
>>        Maximum resident set size (kbytes): 1394840
>>        Maximum resident set size (kbytes): 1392348
>>
>> [1] https://github.com/acmel/dwarves/commit/a1241b095de948becfed882929dda7c4318e022a
>>
>>
> 
> Thanks for these stats! In general, I really like the direction; it also
> fits neatly with our future plans around encoding additional info like
> function addresses; the previous approach of storing all info via name
> matching wasn't ideal for that. I'll try it out and review the changes
> tomorrow.
> 
> One thing I'm curious about; I presume the above stats are for
> single-threaded peak memory utilization, right? If that is correct, how
> do things scale as we add threads? I'd assume that since we're now
> sharing ELF function info, we should see a drop in peak memory
> utilization for nthreads > 1 (as compared to the baseline next code)?
>

Did some experiments here, saw no significant difference in terms of
peak memory utilization; this may indicate the peak memory utilization
is a function of something else. Comparing peak memory utilization
between 1 and 8-threaded encoding, a similar pattern is observed for
baseline next and this series, for baseline 1 vs 8 threads we see:

< 	Maximum resident set size (kbytes): 1069304
---
> 	Maximum resident set size (kbytes): 1119412

...while for this series 1 vs 8 threads we see

< 	Maximum resident set size (kbytes): 1071148
---
> 	Maximum resident set size (kbytes): 1125052

So pretty similar really. Maybe a system with a larger number of
processors would reveal something more here and show the benefits of
shared ELF representations.

> Another thing we're encouraging is running the tests; you can do this
> via
> 
> vmlinux=/path/2/vmlinux ./tests/tests
> 
> Not too many there today, but we're working on growing the set of tests.
> If you set VERBOSE=1 too you can get a lot of extra info around function
> encoding. One thing I think we should be careful about is to ensure we
> get a similar number of functions encoded with these changes as compared
> to baseline. I don't see any major reason why we wouldn't, but good to
> check regardless. Thanks!
>

I tried this too, and compared verbose output of baseline and test
btf_functions; both were identical:

$ VERBOSE=1 vmlinux=/home/almagui/kbuild/bpf-next/vmlinux bash
btf_functions.sh > /var/tmp/btf_functions.baseline
$ VERBOSE=1 vmlinux=/home/almagui/kbuild/bpf-next/vmlinux bash
btf_functions.sh > /var/tmp/btf_functions.test
$ diff /var/tmp/btf_functions.baseline /var/tmp/btf_functions.test
$

This suggests the results are the same since we encode the same number
of functions, refuse to encode the same number of inconsistent functions
etc. Which is great!

Alan