Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/8/19 10:47 AM, Andrii Nakryiko wrote:
> On Wed, Aug 7, 2019 at 9:24 PM Yonghong Song <yhs@xxxxxx> wrote:
>>
>>
>>
>> On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
>>> Make .BTF section allocated and expose its contents through sysfs.
>>>
>>> /sys/kernel/btf directory is created to contain all the BTFs present
>>> inside kernel. Currently there is only kernel's main BTF, represented as
>>> /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
>>> each module will expose its BTF as /sys/kernel/btf/<module-name> file.
>>>
>>> Current approach relies on a few pieces coming together:
>>> 1. pahole is used to take almost final vmlinux image (modulo .BTF and
>>>      kallsyms) and generate .BTF section by converting DWARF info into
>>>      BTF. This section is not allocated and not mapped to any segment,
>>>      though, so is not yet accessible from inside kernel at runtime.
>>> 2. objcopy dumps .BTF contents into binary file and subsequently
>>>      convert binary file into linkable object file with automatically
>>>      generated symbols _binary__btf_kernel_bin_start and
>>>      _binary__btf_kernel_bin_end, pointing to start and end, respectively,
>>>      of BTF raw data.
>>> 3. final vmlinux image is generated by linking this object file (and
>>>      kallsyms, if necessary). sysfs_btf.c then creates
>>>      /sys/kernel/btf/kernel file and exposes embedded BTF contents through
>>>      it. This allows, e.g., libbpf and bpftool access BTF info at
>>>      well-known location, without resorting to searching for vmlinux image
>>>      on disk (location of which is not standardized and vmlinux image
>>>      might not be even available in some scenarios, e.g., inside qemu
>>>      during testing).
>>>
>>> Alternative approach using .incbin assembler directive to embed BTF
>>> contents directly was attempted but didn't work, because sysfs_proc.o is
>>> not re-compiled during link-vmlinux.sh stage. This is required, though,
>>> to update embedded BTF data (initially empty data is embedded, then
>>> pahole generates BTF info and we need to regenerate sysfs_btf.o with
>>> updated contents, but it's too late at that point).
>>>
>>> If BTF couldn't be generated due to missing or too old pahole,
>>> sysfs_btf.c handles that gracefully by detecting that
>>> _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
>>> /sys/kernel/btf at all.
>>>
>>> v1->v2:
>>> - allow kallsyms stage to re-use vmlinux generated by gen_btf();
>>>
>>> Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx>
>>> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
>>> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
>>> Cc: Sam Ravnborg <sam@xxxxxxxxxxxx>
>>> Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
>>> ---
> 
> [...]
> 
>>> +
>>> +     # dump .BTF section into raw binary file to link with final vmlinux
>>> +     bin_arch=$(${OBJDUMP} -f ${1} | grep architecture | \
>>> +             cut -d, -f1 | cut -d' ' -f2)
>>> +     ${OBJCOPY} --dump-section .BTF=.btf.kernel.bin ${1} 2>/dev/null
>>> +     ${OBJCOPY} -I binary -O ${CONFIG_OUTPUT_FORMAT} -B ${bin_arch} \
>>> +             --rename-section .data=.BTF .btf.kernel.bin ${2}
>>
>> Currently, the binary size on my config is about 2.6MB. Do you think
>> we could or need to compress it to make it smaller? I tried gzip
>> and the compressed size is 0.9MB.
> 
> I'd really prefer to keep it uncompressed for two main reasons:
> - by having this in uncompressed form, kernel itself can use this BTF
> data from inside with almost no additional memory (except maybe for
> index from type ID to actual location of type info), which opens up a
> lot of new and interesting opportunities, like kernel returning its
> own BTF and BTF type ID for various types (think about driver metdata,
> all those special maps, etc).
> - if we are doing compression, now we need to decide on best
> compression format, teach it libbpf (which will make libbpf also
> bigger and depending on extra libraries), etc.
> 
> So basically, in exchange of 1-1.5MB extra memory we get a bunch of
> new problems we normally don't have to deal with.

Yes, I am aware of this tradeoff. Just to make sure this has been 
discussed. I am totally fine with leaving it uncompressed.

> 
>>
>>>    }
>>>
>>>    # Create ${2} .o file with all symbols from the ${1} object file
>>> @@ -153,6 +164,7 @@ sortextable()
>>>    # Delete output files in case of error
>>>    cleanup()
>>>    {
> 
> [...]
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux