Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 7, 2019 at 9:24 PM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
> > Make .BTF section allocated and expose its contents through sysfs.
> >
> > /sys/kernel/btf directory is created to contain all the BTFs present
> > inside kernel. Currently there is only kernel's main BTF, represented as
> > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> > each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> >
> > Current approach relies on a few pieces coming together:
> > 1. pahole is used to take almost final vmlinux image (modulo .BTF and
> >     kallsyms) and generate .BTF section by converting DWARF info into
> >     BTF. This section is not allocated and not mapped to any segment,
> >     though, so is not yet accessible from inside kernel at runtime.
> > 2. objcopy dumps .BTF contents into binary file and subsequently
> >     convert binary file into linkable object file with automatically
> >     generated symbols _binary__btf_kernel_bin_start and
> >     _binary__btf_kernel_bin_end, pointing to start and end, respectively,
> >     of BTF raw data.
> > 3. final vmlinux image is generated by linking this object file (and
> >     kallsyms, if necessary). sysfs_btf.c then creates
> >     /sys/kernel/btf/kernel file and exposes embedded BTF contents through
> >     it. This allows, e.g., libbpf and bpftool access BTF info at
> >     well-known location, without resorting to searching for vmlinux image
> >     on disk (location of which is not standardized and vmlinux image
> >     might not be even available in some scenarios, e.g., inside qemu
> >     during testing).
> >
> > Alternative approach using .incbin assembler directive to embed BTF
> > contents directly was attempted but didn't work, because sysfs_proc.o is
> > not re-compiled during link-vmlinux.sh stage. This is required, though,
> > to update embedded BTF data (initially empty data is embedded, then
> > pahole generates BTF info and we need to regenerate sysfs_btf.o with
> > updated contents, but it's too late at that point).
> >
> > If BTF couldn't be generated due to missing or too old pahole,
> > sysfs_btf.c handles that gracefully by detecting that
> > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> > /sys/kernel/btf at all.
> >
> > v1->v2:
> > - allow kallsyms stage to re-use vmlinux generated by gen_btf();
> >
> > Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx>
> > Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> > Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> > Cc: Sam Ravnborg <sam@xxxxxxxxxxxx>
> > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
> > ---

[...]

> > +
> > +     # dump .BTF section into raw binary file to link with final vmlinux
> > +     bin_arch=$(${OBJDUMP} -f ${1} | grep architecture | \
> > +             cut -d, -f1 | cut -d' ' -f2)
> > +     ${OBJCOPY} --dump-section .BTF=.btf.kernel.bin ${1} 2>/dev/null
> > +     ${OBJCOPY} -I binary -O ${CONFIG_OUTPUT_FORMAT} -B ${bin_arch} \
> > +             --rename-section .data=.BTF .btf.kernel.bin ${2}
>
> Currently, the binary size on my config is about 2.6MB. Do you think
> we could or need to compress it to make it smaller? I tried gzip
> and the compressed size is 0.9MB.

I'd really prefer to keep it uncompressed for two main reasons:
- by having this in uncompressed form, kernel itself can use this BTF
data from inside with almost no additional memory (except maybe for
index from type ID to actual location of type info), which opens up a
lot of new and interesting opportunities, like kernel returning its
own BTF and BTF type ID for various types (think about driver metdata,
all those special maps, etc).
- if we are doing compression, now we need to decide on best
compression format, teach it libbpf (which will make libbpf also
bigger and depending on extra libraries), etc.

So basically, in exchange of 1-1.5MB extra memory we get a bunch of
new problems we normally don't have to deal with.

>
> >   }
> >
> >   # Create ${2} .o file with all symbols from the ${1} object file
> > @@ -153,6 +164,7 @@ sortextable()
> >   # Delete output files in case of error
> >   cleanup()
> >   {

[...]



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux