On Fri, Sep 9, 2022 at 12:31 PM Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> wrote: > > >> (a) While we save space on vmlinux BTF, each module will have a bit of > >> extra data for variable types. On my laptop (5.15 based) I have 9.8 > >> MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB. > >> If we assume the same overhead of 23.7%, that would be 1.1 MB of > >> extra module BTF for my particular use case. > >> > >> $ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}' > >> 9876871 > >> $ ls -l /sys/kernel/btf/vmlinux > >> -r--r--r-- 1 root root 5174406 Sep 7 14:20 /sys/kernel/btf/vmlinux > >> > >> (b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain > >> duplicate type definitions, wasting additional space. However, as > >> far as I understand it, this was already a possibility, e.g. > >> $MODULE1 and $MODULE2 could already contain duplicate types. So I > >> think this downside is no more. > > > > Both concerns are valid, but I'm a bit puzzled with (a). > > At least in the networking drivers the number of global vars is very small. > > I expected other drivers to be similar. > > So having "functions and all vars" in ko-s should not add > > that much overhead. > > > > Maybe you're seeing this overhead because pahole is adding > > all declared vars and not only the vars that are actually present? > > That would explain the discrepancy. > > (b) with a bunch of duplicates is a sign that something is off as well. > > Sorry, I didn't actually have an analysis for module BTF, I was just > extrapolating the result I had seen for vmlinux. I went ahead and did a > proper test, generating BTF for a distribution kernel from Oracle Linux > (kernel-uek-5.15.0-1.43.4.1.el9uek.x86_64) - something that I easily had > on hand and could regenerate the BTF for quickly. > > Basically, the steps were: > > pahole -J vmlinux --btf_encode_detached=vmlinux.btf > pahole -J vmlinux --btf_encode_detached=vmlinux.btf.all \ > --encode_all_btf_vars > > # For each module > pahole -J $MODULE --btf_encode_detached=$MODULE.btf \ > --btf_base=vmlinux.btf > pahole -J $MODULE --btf_encode_detached=$MODULE.btf.all \ > --btf_base=vmlinux.btf --encode_all_btf_vars > > # what if we based the module BTF on the "vmlinux.btf.all" instead? > pahole -J $MODULE --btf_encode_detached=$MODULE.btf.all.all \ > --btf_base=vmlinux.btf.all --encode_all_btf_vars > > And then using ls/awk to sum up the bytes of each BTF file. Results are: > > vmlinux: > > -rw-r-----. 1 opc opc 4904193 Sep 9 18:58 vmlinux.btf > -rw-r-----. 1 opc opc 6534684 Sep 9 18:58 vmlinux.btf.all > > In this case there's a 33% increase in BTF size. > > modules: > > $ ls -l *.btf | awk '{sum += $5} END {print(sum)}' > 43979532 > $ ls -l *.btf.all | awk '{sum += $5} END {print(sum)}' > 44757792 > $ ls -l *.btf.all.all | awk '{sum += $5} END {print(sum)}' > 44696639 > > So the "*.btf.all.all" modules were just an experiment to see if the > extra data inside "vmlinux.btf.all" could reduce some duplication in > module BTF. The answer was yes, but not enough to make up for the > increase in the vmlinux BTF size. > > The "*.btf.all" modules are the ones we would actually expect to use in > Option #1, where we have a vmlinux-btf-extras and the rest of the > modules include their globals in their BTF sections directly, and are > based off of the vmlinux BTF. This test shows on average, that the > module BTF size would grow by 1.6% with Option #1. Of course the exact > memory size that accounts for will vary by workload, depending on how > many modules are loaded. But I'd imagine, assuming you have around 5MB > of module BTF *actually loaded*, then the overhead would be around 85k > bytes. I don't know about how you feel, but I think that sounds > acceptable, it's just 22 pages at 4k size :) > > Let me know how it sounds to you. > > Thanks, > Stephen > > >> > >> > >> Option #2 > >> --------- > >> > >> * The vmlinux-btf-extra module is still added as in Option #1. > >> > >> * Further, each module would have its own "$MODULE-btf-extra" module to > >> add in extra BTF. These would be built with a --btf_base=$MODULE.ko > >> and of course that BTF is based on vmlinux, so we would have: > >> > >> vmlinux_btf [ functions and percpu vars only ] > >> |- vmlinux-btf-extras [ all other vars for vmlinux ] > >> |- $MODULE [ functions and percpu vars only ] > >> |- $MODULE-btf-extra [ all other vars for $MODULE ] > >> > >> This is much more complex, pahole must be extended to support a > >> hierarchy of --btf_base files. The kernel itself may not need to > >> understand multi-level BTF since there's no requirement that it actually > >> understand $MODULE-btf-extra, so long as it exposes it via > >> /sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of > >> mechanism to allow an administrator to say "please always load > >> $MODULE-btf-extras alongside $MODULE", but I think that would be a > >> userspace problem. > >> > >> This resolves issue (a) from option #1, of course at implementation > >> cost. > >> > >> Regardless of Option #1 or #2, I'd propose that we implement this as a > >> tristate, similar to what Alan proposed [2]. When set to "m" we use the > >> solutions described above, and when set to "y", we don't bother with it, > >> instead using --encode_all_btf_vars for all generation. > >> > >> If we go with Option #1, no changes to this series should be necessary. > >> If we go with Option #2, I'll need to extend pahole to support at least > >> two BTF base files. Please let me know your thoughts. > > > > Completely agree that two level btf-extra needs quite a bit more work. > > Before we proceed with option 2 let's figure out > > the reason for extra space in option 1. I don't think an extra module for each module just for keeping those all-var-BTFs is acceptable, so Option #2 doesn't even seem like an option. But given a very small increase in size of BTF for modules when including variables, I think Option #1 is quite reasonable.