Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:
> On Wed, Sep 7, 2022 at 12:07 PM Stephen Brennan
> <stephen.s.brennan@xxxxxxxxxx> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:
>> > On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
>> > <stephen.s.brennan@xxxxxxxxxx> wrote:
>> [...]
>> >> Future Work
>> >> -----------
>> >>
>> >> If this proves acceptable, I'd like to follow-up with a kernel patch to
>> >> add a configuration option (default=n) for generating BTF with all
>> >> variables, which distributions could choose to enable or not.
>> >>
>> >> There was previous discussion[3] about leveraging split BTF or building
>> >> additional kernel modules to contain the extra variables. I believe with
>> >> this patch series, it is possible to do that. However, I'd argue that
>> >> simpler is better here: the advantage for using BTF is having it all
>> >> available in the kernel/module image. Storing extra BTF on the
>> >> filesystem would break that advantage, and at that point, you'd be
>> >> better off using a debuginfo format like CTF, which is lightweight and
>> >> expected to be found on the filesystem.
>> >
>> > With all or nothing approach the distros would have a hard choice
>> > to make whether to enable that kconfig, increase BTF and consume
>> > extra memory without any obvious reason or just don't do it.
>> > Majority probably is not going to enable it.
>> > So the feature will become a single vendor only and with
>> > inevitable bit-rot.
>>
>> I'd intend to support it even if just a single distribution enabled it.
>> But I do see your concern.
>
> This thread was dormant for 8 days.
> That's a poor example of "intend to support".

You're right, I definitely could have replied sooner. I'm sorry for that.

>> > Whereas with split BTF and extra kernel module approach
>> > we can enable BTF with all global vars by default.
>> > The extra module will be shipped by all distros and tools
>> > like bpftrace might start using it.
>>
>> Split BTF is currently limited to a single base BTF file. We'd need more
>> patches for pahole to support multiple --btf_base files: e.g.
>> vmlinux.btf and vmlinux-variables.btf. There's also the question of
>> modules: presumably we wouldn't try to have "$MODULE" and
>> "$MODULE-btf-extra" modules due to the added complexity. I doubt the
>> space savings would be worth it.
>>
>> I can look into these concerns, but if possible I would like to proceed
>> with this series, as it is a separate concern from the exact mechanism
>> by which we include extra BTF into the kernel.
>
> Not an option. Sorry.

Ok, so let me describe what I understand to be the proposed design as of
the previous thread, and see if it satisfies your concerns. We can work
from there to make sure we've got a concensus design before going
further.

Option #1
---------

* A new module, "vmlinux-btf-extra" (or something roughly like that) is
  added, which contains BTF only. It is generated with
  --encode_all_btf_vars and uses --btf_base=path/to/vmlinux_btf so that
  it contains only BTF variables. The vmlinux BTF would be generated
  same as always (without the --encode_all_btf_vars).

* In the previous thread, it was proposed [1] that modules could
  include variables in their BTF in order to reduce the complexity of
  the change. Modules would have their BTF generated using
  --encode_all_btf_vars and --btf_base=path/to/vmlinux_btf. The
  resulting hierarchy would look like this:

  vmlinux_btf  [ functions and percpu vars only ]
  |- vmlinux-btf-extra [ all other vars for vmlinux ]
  |- $MODULE   [ functions and all vars ]
  ...

This option is desirable because it means that we only need 2-level
split BTF, and so we don't actually need to make changes to pahole for
multiple --btf_base files. There are two downsides I see:

(a) While we save space on vmlinux BTF, each module will have a bit of
    extra data for variable types. On my laptop (5.15 based) I have 9.8
    MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB.
    If we assume the same overhead of 23.7%, that would be 1.1 MB of
    extra module BTF for my particular use case.

    $ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}'
    9876871
    $ ls -l /sys/kernel/btf/vmlinux
    -r--r--r-- 1 root root 5174406 Sep  7 14:20 /sys/kernel/btf/vmlinux

(b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain
    duplicate type definitions, wasting additional space. However, as
    far as I understand it, this was already a possibility, e.g.
    $MODULE1 and $MODULE2 could already contain duplicate types. So I
    think this downside is no more.


Option #2
---------

* The vmlinux-btf-extra module is still added as in Option #1.

* Further, each module would have its own "$MODULE-btf-extra" module to
  add in extra BTF. These would be built with a --btf_base=$MODULE.ko
  and of course that BTF is based on vmlinux, so we would have:

  vmlinux_btf              [ functions and percpu vars only ]
  |- vmlinux-btf-extras    [ all other vars for vmlinux ]
  |- $MODULE               [ functions and percpu vars only ]
     |- $MODULE-btf-extra  [ all  other vars for $MODULE ]

This is much more complex, pahole must be extended to support a
hierarchy of --btf_base files. The kernel itself may not need to
understand multi-level BTF since there's no requirement that it actually
understand $MODULE-btf-extra, so long as it exposes it via
/sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of
mechanism to allow an administrator to say "please always load
$MODULE-btf-extras alongside $MODULE", but I think that would be a
userspace problem.

This resolves issue (a) from option #1, of course at implementation
cost.

Regardless of Option #1 or #2, I'd propose that we implement this as a
tristate, similar to what Alan proposed [2]. When set to "m" we use the
solutions described above, and when set to "y", we don't bother with it,
instead using --encode_all_btf_vars for all generation.

If we go with Option #1, no changes to this series should be necessary.
If we go with Option #2, I'll need to extend pahole to support at least
two BTF base files. Please let me know your thoughts.

Stephen

[1]: https://lore.kernel.org/bpf/CAEf4BzZmJKqXaJMBxhKqFNXzjO=eN5gk2xQVnmQVdK2xd3HQ=g@xxxxxxxxxxxxxx/
[2]: https://lore.kernel.org/bpf/alpine.LRH.2.23.451.2205032254390.10133@MyRouter/



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux