Re: BTF and libBPF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 13, 2022 at 10:05 PM Vincent Li <vincent.mc.li@xxxxxxxxx> wrote:
>
> On Mon, Sep 12, 2022 at 11:05 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> >
> > On Mon, Sep 12, 2022 at 8:41 PM Vincent Li <vincent.mc.li@xxxxxxxxx> wrote:
> > >
> > > On Mon, Sep 12, 2022 at 5:17 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Sun, Sep 11, 2022 at 4:36 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > Thanks for the quick response.
> > > > >
> > > > > > > Greeting,
> > > > > > >
> > > > > > > I have questions related to CONFIG_DEBUG_INFO_BTF, and  libbpf_0.8.1.
> > > > > > > Please kindly let me know if this is not the right group to ask, since I'm new.
> > > > > > >
> > > > > > > To give context of this question:
> > > > > > > This system has limited disk size, doesn't need the CO-RE feature,
> > > > > > > and has all debug symbols stripped in release build.   Having an extra
> > > > > > > btf/vmlinux file might be problematic, disk-wise.
> > > > >
> > > > > > Thanks for getting in touch - ideally I think we'd like to be
> > > > > > able to support BTF even on small systems. It would probably
> > > > > > help to understand what space constraints you have - is it just
> > > > > > disk space, or are disk space and memory highly constrained? The
> > > > > > mechanics of BTF are that it is generated and then embedded in the vmlinux
> > > > > > binary in a .BTF section. The BTF info is then exposed at runtime
> > > > > > via a /sys/kernel/btf/vmlinux pseudo-file.  So when assessing overhead,
> > > > > > there are two questions to ask I think:
> > > > >
> > > > > > 1. how does BTF inclusion effect disk space?
> > > > > > 2. how does BTF inclusion effect memory footprint?
> > > > >
> > > > > > For 1, on a recent bpf-next kernel, core vmlinux BTF is around 6Mb.
> > > > > > However, an important thing to bear in mind is that it is in the
> > > > > > vmlinux binary, that on most space-constrained systems is compressed
> > > > > > to /boot/vmlinuz-<VERSION>.  When I compress the BTF by hand, it reduces
> > > > > > by a factor of around 6, so a ballpark figure is around 1.5Mb of
> > > > > > the vmlinuz binary on-disk, which equates to around 10% of the overall
> > > > > > binary size in my case. Your results may vary, especially if
> > > > > > a lot of CONFIG options are switched off (as they might be on a
> > > > > > space-sensitive system).
> > > > >
> > > > > > For memory footprint, BTF will be extracted from the .BTF section
> > > > > > and will then take up around 6Mb.
> > > > >
> > > > > > Another piece of the puzzle is module BTF - it contains the
> > > > > > per-module type info not in the core kernel, but again if modules
> > > > > > are compressed, on-disk storage might not be a massive issue.
> > > > >
> > > > > > Anyway, hopefully the above gives you a sense for the kinds of
> > > > > > costs BTF has.
> > > > >
> > > > > Thank you. This information on disk and memory is really helpful.
> > > > > At this moment, I'm only looking at disk-size.
> > > > >
> > > > > > >
> > > > > > > Question 1>
> > > > > > > Will libbpf_0.8.1(or later) work with kernel 5.10 (or later),  without
> > > > > > > CONFIG_DEBUG_INFO_BTF ?
> > > > > > > Or work with kernel compiled with CONFIG_DEBUG_INFO_BTF but have
> > > > > > > /sys/kernel/btf/vmlinux removed.
> > > > > > >
> > > > >
> > > > > > It really depends on what you're planning on doing.
> > > > >
> > > > > > BTF has become central to a lot of aspects of BPF; higher-performance
> > > > > > fentry/fexit() BPF programs, CO-RE, and even XDP will be using BTF
> > > > > > soon I believe.
> > > > >
> > > > > > So if you're using BPF without BTF, there are generally ways to make
> > > > > things work (using kprobes instead of fentry for example), but you
> > > > > > will have less options.  I seem to recall some fixes landed to
> > > > > > ensure that absence of BTF shouldn't prevent program loading in
> > > > > > cases where BTF is not needed. If you run into any such failures,
> > > > > > I'd suggest reporting them and hopefully we can get them fixed.
> > > > >
> > > > > I have a follow up question on how CO-RE uses BTF: where exactly does
> > > > > the relocation happen ?
> > > > > It seems, in theory,  it can happens in two places: 1> from libBPF at
> > > > > user space 2> from kernel
> > > > >
> > > > > https://nakryiko.com/posts/bpf-portability-and-co-re/
> > > > > " It takes compiled BPF ELF object file, post-processes it as
> > > > > necessary, sets up various kernel objects (maps, programs, etc),
> > > > > and triggers BPF program loading and verification."
> > > > >
> > > > > I assume there is a syscall to provide BTF information from kernel to
> > > > > user space, and libBPF uses that info to post-processing the ELF file.
> > > > >
> > > > > Is there a sample BPF code with explanation of a sequence of actions
> > > > > done by libBPF (roughly) to look at ?
> > > > > And why do maps need to be relocated ?
> > > > >
> > > > > 2>
> > > > > https://nakryiko.com/posts/bpf-core-reference-guide/ BTF-enabled BPF
> > > > > program types with direct memory reads
> > > > > In this mode, is that kernel doing relocation ? or is that still libBPF?
> > > > > For example: how/where vma->vm_start is relocated.
> > > > >
> > > > > SEC("lsm/file_mprotect")
> > > > > int BPF_PROG(mprotect_audit, struct vm_area_struct *vma,
> > > > >     unsigned long reqprot, unsigned long prot, int ret)
> > > > > {
> > > > >    /* .. omit ..*/
> > > > > int is_heap;
> > > > > is_heap = (vma->vm_start >= vma->vm_mm->start_brk &&
> > > > >   vma->vm_end <= vma->vm_mm->brk);
> > > > >    /* .. omit .. */
> > > > > }
> > > > >
> > > > > Thanks
> > > > > Best Regards,
> > > > > Jeff Xu
> > > > >
> > > > >
> > > > > On Fri, Sep 9, 2022 at 8:29 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On 09/09/2022 06:22, Jeff Xu wrote:
> > > > > > > Greeting,
> > > > > > >
> > > > > > > I have questions related to CONFIG_DEBUG_INFO_BTF, and  libbpf_0.8.1.
> > > > > > > Please kindly let me know if this is not the right group to ask, since I'm new.
> > > > > > >
> > > > > > > To give context of this question:
> > > > > > > This system has limited disk size, doesn't need the CO-RE feature,
> > > > > > > and has all debug symbols stripped in release build.   Having an extra
> > > > > > > btf/vmlinux file might be problematic, disk-wise.
> > > > > >
> > > > > > Thanks for getting in touch - ideally I think we'd like to be
> > > > > > able to support BTF even on small systems. It would probably
> > > > > > help to understand what space constraints you have - is it just
> > > > > > disk space, or are disk space and memory highly constrained? The
> > > > > > mechanics of BTF are that it is generated and then embedded in the vmlinux
> > > > > > binary in a .BTF section. The BTF info is then exposed at runtime
> > > > > > via a /sys/kernel/btf/vmlinux pseudo-file.  So when assessing overhead,
> > > > > > there are two questions to ask I think:
> > > > > >
> > > > > > 1. how does BTF inclusion effect disk space?
> > > > > > 2. how does BTF inclusion effect memory footprint?
> > > > > >
> > > > > > For 1, on a recent bpf-next kernel, core vmlinux BTF is around 6Mb.
> > > > > > However, an important thing to bear in mind is that it is in the
> > > > > > vmlinux binary, that on most space-constrained systems is compressed
> > > > > > to /boot/vmlinuz-<VERSION>.  When I compress the BTF by hand, it reduces
> > > > > > by a factor of around 6, so a ballpark figure is around 1.5Mb of
> > > > > > the vmlinuz binary on-disk, which equates to around 10% of the overall
> > > > > > binary size in my case. Your results may vary, especially if
> > > > > > a lot of CONFIG options are switched off (as they might be on a
> > > > > > space-sensitive system).
> > > > > >
> > > > > > For memory footprint, BTF will be extracted from the .BTF section
> > > > > > and will then take up around 6Mb.
> > > > > >
> > > > > > Another piece of the puzzle is module BTF - it contains the
> > > > > > per-module type info not in the core kernel, but again if modules
> > > > > > are compressed, on-disk storage might not be a massive issue.
> > > > > >
> > > > > > Anyway, hopefully the above gives you a sense for the kinds of
> > > > > > costs BTF has.
> > > > > >
> > > > > > >
> > > > > > > Question 1>
> > > > > > > Will libbpf_0.8.1(or later) work with kernel 5.10 (or later),  without
> > > > > > > CONFIG_DEBUG_INFO_BTF ?
> > > > > > > Or work with kernel compiled with CONFIG_DEBUG_INFO_BTF but have
> > > > > > > /sys/kernel/btf/vmlinux removed.
> > > > > > >
> > > > > >
> > > > > > It really depends what you're planning on doing.
> > > > > >
> > > > > > BTF has become central to a lot of aspects of BPF; higher-performance
> > > > > > fentry/fexit() BPF programs, CO-RE, and even XDP will be using BTF
> > > > > > soon I believe.
> > > > > >
> > > > > > So if you're using BPF without BTF, there are generally ways to make
> > > > > > things work (using kprobes instead of fentry for example), but you
> > > > > > will have less options.  I seem to recall some fixes landed to
> > > > > > ensure that absence of BTF shouldn't prevent program loading in
> > > > > > cases where BTF is not needed. If you run into any such failures,
> > > > > > I'd suggest reporting them and hopefully we can get them fixed.
> > > > > >
> > > > > > >  Question 2: From debug information included at run time point of view,
> > > > > > > (1) having btf/vmlinux vs (2) kernel build with
> > > > > > > CONFIG_DEBUG_INFO_DWARF5 but not stripped,
> > > > > > > are those two contains the same amount of debug information at runtime?
> > > > > > >
> > > > > >
> > > > > > DWARF5 will contain more debug info, but will likely have a larger footprint
> > > > > > as a consequence. I'd suggest running the experiment yourself to compare.
> > > > > >
> > > > > > > Question 3: Will libbpf + btf/vmlinx, break expectation of kernel ASLR
> > > > > > > feature ? I assume it shouldn't, but would like to double check.
> > > > > > >
> > > > > >
> > > > > > Nope, no issue here that I'm aware of. I've used KASLR + BTF and haven't seen
> > > > > > any problems at least.
> > > > > >
> > > > > > > Thanks
> > > > > > > Best Regards,
> > > > > > > Jeff Xu
> > > > > > >
> > > >
> > > > Can I understand the BTF usage in this way ?
> > > >
> > > > When BTF is available in the kernel runtime, it helps in two ways:
> > > > 1> By BTF verifier (kernel) to find the offset of a member in struct
> > > > (no libbpf modification of BYTE code needed)
> > > > The example usage is BTF RAW tracepoint, BFP_LSM.
> > > > Typically, those BPF programs will includes "vmlinux.h" , and  uses C
> > > > pointer style(vma->vm_start)
> > > >
> > > > 2> By libbpf (user space) to post-processing BPF bytecode.
> > > > Typically, those BPF programs doesn't need to include "vmlinux.h", and
> > > > uses bpf_core_read, such as:
> > > > BPF_CORE_READ(vma,vm_start)
> > > >
> > > > Much appreciated to confirm this is right/wrong.
> > > >
> > >
> > > Does not answer your question directly :) from my limited
> > > understanding, could be incorrect, BTF is processed at compile time
> > > and load time,  load time is processed  by libbpf
> > >
> > So even for BTF RAW tracepoint, the relocation is happening at libbpf ?
> > According to this post:
> > https://mozillazg.com/2022/06/ebpf-libbpf-btf-powered-enabled-raw-tracepoint-common-questions-en.html#hidthe-difference-between-btf-raw-tracepoint-and-raw-tracepoint
> >
> > // btf enabled
> > struct task_struct *task = (struct task_struct *) bpf_get_current_task_btf();
> > u32 ppid = task->real_parent->tgid;
> >
> > "The btf version can access kernel memory directly from within the ebpf program.
> > There is no need to use a helper function like bpf_core_read or
> > bpf_probe_read_kernel to access the kernel memory as in regular raw
> > tracepoint:"
> >
> > It talks about accessing kernel memory directly, so I was reading it
> > as  the kernel is doing the relocation.
> >
>
> Would this help ?
> https://lore.kernel.org/bpf/20191016032505.2089704-6-ast@xxxxxxxxxx/
>
I'm not sure. But thanks.

Another way to look is through  objdump of the BPF bytecode
1> direct memory read.
SEC("lsm/bprm_committed_creds")
int BPF_PROG(handle_committed_creds, struct linux_binprm* binprm) {
  struct task_struct* task;
  task = (struct task_struct*)bpf_get_current_task_btf();
  return task->real_parent->tgid;
}

0000000000000000 <handle_committed_creds>:
       0: 85 00 00 00 9e 00 00 00 call 158
       1: 79 01 b0 05 00 00 00 00 r1 = *(u64 *)(r0 + 1456)
       2: 61 10 a4 05 00 00 00 00 r0 = *(u32 *)(r1 + 1444)
       3: 95 00 00 00 00 00 00 00 exit

2> relocate by libbpf.
SEC("lsm/bprm_committed_creds")
int BPF_PROG(handle_committed_creds, struct linux_binprm* binprm) {
  struct task_struct* task;
  task = (struct task_struct*)bpf_get_current_task();
  return BPF_CORE_READ(task,real_parent,tgid);
}

0000000000000000 <handle_committed_creds>:
       0: 85 00 00 00 23 00 00 00 call 35
       1: b7 01 00 00 b0 05 00 00 r1 = 1456
       2: 0f 10 00 00 00 00 00 00 r0 += r1
       3: bf a1 00 00 00 00 00 00 r1 = r10
       4: 07 01 00 00 f0 ff ff ff r1 += -16
       5: b7 02 00 00 08 00 00 00 r2 = 8
       6: bf 03 00 00 00 00 00 00 r3 = r0
       7: 85 00 00 00 71 00 00 00 call 113    <-------- (probably
bpf_probe_read_kernel ?)
       8: b7 01 00 00 a4 05 00 00 r1 = 1444
       9: 79 a3 f0 ff 00 00 00 00 r3 = *(u64 *)(r10 - 16)
      10: 0f 13 00 00 00 00 00 00 r3 += r1
      11: bf a1 00 00 00 00 00 00 r1 = r10
      12: 07 01 00 00 fc ff ff ff r1 += -4
      13: b7 02 00 00 04 00 00 00 r2 = 4
      14: 85 00 00 00 71 00 00 00 call 113. <----------
      15: 61 a0 fc ff 00 00 00 00 r0 = *(u32 *)(r10 - 4)
      16: 95 00 00 00 00 00 00 00 exit

For 1> (direct address read)
the member offset is already in the code, I assume no relocation needed.

For 2> (bfp_core_read)
My guess is that libbpf will relocate/change this code, for example,
when offset "real_parent" changes within the task struct, and libbpf
did this using some information from the elf section.

Thanks
Jeff.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux