On 8/8/19 10:47 AM, Andrii Nakryiko wrote: > On Wed, Aug 7, 2019 at 9:24 PM Yonghong Song <yhs@xxxxxx> wrote: >> >> >> >> On 8/7/19 5:32 PM, Andrii Nakryiko wrote: >>> Make .BTF section allocated and expose its contents through sysfs. >>> >>> /sys/kernel/btf directory is created to contain all the BTFs present >>> inside kernel. Currently there is only kernel's main BTF, represented as >>> /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported, >>> each module will expose its BTF as /sys/kernel/btf/<module-name> file. >>> >>> Current approach relies on a few pieces coming together: >>> 1. pahole is used to take almost final vmlinux image (modulo .BTF and >>> kallsyms) and generate .BTF section by converting DWARF info into >>> BTF. This section is not allocated and not mapped to any segment, >>> though, so is not yet accessible from inside kernel at runtime. >>> 2. objcopy dumps .BTF contents into binary file and subsequently >>> convert binary file into linkable object file with automatically >>> generated symbols _binary__btf_kernel_bin_start and >>> _binary__btf_kernel_bin_end, pointing to start and end, respectively, >>> of BTF raw data. >>> 3. final vmlinux image is generated by linking this object file (and >>> kallsyms, if necessary). sysfs_btf.c then creates >>> /sys/kernel/btf/kernel file and exposes embedded BTF contents through >>> it. This allows, e.g., libbpf and bpftool access BTF info at >>> well-known location, without resorting to searching for vmlinux image >>> on disk (location of which is not standardized and vmlinux image >>> might not be even available in some scenarios, e.g., inside qemu >>> during testing). >>> >>> Alternative approach using .incbin assembler directive to embed BTF >>> contents directly was attempted but didn't work, because sysfs_proc.o is >>> not re-compiled during link-vmlinux.sh stage. This is required, though, >>> to update embedded BTF data (initially empty data is embedded, then >>> pahole generates BTF info and we need to regenerate sysfs_btf.o with >>> updated contents, but it's too late at that point). >>> >>> If BTF couldn't be generated due to missing or too old pahole, >>> sysfs_btf.c handles that gracefully by detecting that >>> _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating >>> /sys/kernel/btf at all. >>> >>> v1->v2: >>> - allow kallsyms stage to re-use vmlinux generated by gen_btf(); >>> >>> Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx> >>> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> >>> Cc: Jiri Olsa <jolsa@xxxxxxxxxx> >>> Cc: Sam Ravnborg <sam@xxxxxxxxxxxx> >>> Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> >>> --- > > [...] > >>> + >>> + # dump .BTF section into raw binary file to link with final vmlinux >>> + bin_arch=$(${OBJDUMP} -f ${1} | grep architecture | \ >>> + cut -d, -f1 | cut -d' ' -f2) >>> + ${OBJCOPY} --dump-section .BTF=.btf.kernel.bin ${1} 2>/dev/null >>> + ${OBJCOPY} -I binary -O ${CONFIG_OUTPUT_FORMAT} -B ${bin_arch} \ >>> + --rename-section .data=.BTF .btf.kernel.bin ${2} >> >> Currently, the binary size on my config is about 2.6MB. Do you think >> we could or need to compress it to make it smaller? I tried gzip >> and the compressed size is 0.9MB. > > I'd really prefer to keep it uncompressed for two main reasons: > - by having this in uncompressed form, kernel itself can use this BTF > data from inside with almost no additional memory (except maybe for > index from type ID to actual location of type info), which opens up a > lot of new and interesting opportunities, like kernel returning its > own BTF and BTF type ID for various types (think about driver metdata, > all those special maps, etc). > - if we are doing compression, now we need to decide on best > compression format, teach it libbpf (which will make libbpf also > bigger and depending on extra libraries), etc. > > So basically, in exchange of 1-1.5MB extra memory we get a bunch of > new problems we normally don't have to deal with. Yes, I am aware of this tradeoff. Just to make sure this has been discussed. I am totally fine with leaving it uncompressed. > >> >>> } >>> >>> # Create ${2} .o file with all symbols from the ${1} object file >>> @@ -153,6 +164,7 @@ sortextable() >>> # Delete output files in case of error >>> cleanup() >>> { > > [...] >