On Wed, Aug 7, 2019 at 11:08 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Aug 08, 2019 at 04:24:25AM +0000, Yonghong Song wrote: > > > > > > On 8/7/19 5:32 PM, Andrii Nakryiko wrote: > > > Make .BTF section allocated and expose its contents through sysfs. > > Was this original patch not on bpf@vger? I can't find it in my > archive. Anyway... > > > > /sys/kernel/btf directory is created to contain all the BTFs present > > > inside kernel. Currently there is only kernel's main BTF, represented as > > > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported, > > > each module will expose its BTF as /sys/kernel/btf/<module-name> file. > > Why are you using sysfs for this? Who uses "BTF"s? Are these debugging > images that only people working on developing bpf programs are going to > need, or are these things that you are going to need on a production > system? We need it in production system. One immediate and direct use case is BPF CO-RE (Compile Once - Run Everywhere), which aims to allow to pre-compile BPF applications (even those that read internal kernel structures) using any local kernel headers, and then distribute and run them in binary form on all target production machines without dependencies on kernel headers and having Clang on target machine to compile C to BPF IR. Libbpf is doing all those adjustments/relocations based on kernel's actual BTF. See [0] for a summary and slides, if you curious to learn more. [0] http://vger.kernel.org/bpfconf2019.html#session-2 > > I ask as maybe debugfs is the best place for this if they are not needed > on production systems. > > > > > > > > Current approach relies on a few pieces coming together: > > > 1. pahole is used to take almost final vmlinux image (modulo .BTF and > > > kallsyms) and generate .BTF section by converting DWARF info into > > > BTF. This section is not allocated and not mapped to any segment, > > > though, so is not yet accessible from inside kernel at runtime. > > > 2. objcopy dumps .BTF contents into binary file and subsequently > > > convert binary file into linkable object file with automatically > > > generated symbols _binary__btf_kernel_bin_start and > > > _binary__btf_kernel_bin_end, pointing to start and end, respectively, > > > of BTF raw data. > > > 3. final vmlinux image is generated by linking this object file (and > > > kallsyms, if necessary). sysfs_btf.c then creates > > > /sys/kernel/btf/kernel file and exposes embedded BTF contents through > > > it. This allows, e.g., libbpf and bpftool access BTF info at > > > well-known location, without resorting to searching for vmlinux image > > > on disk (location of which is not standardized and vmlinux image > > > might not be even available in some scenarios, e.g., inside qemu > > > during testing). > > > > > > Alternative approach using .incbin assembler directive to embed BTF > > > contents directly was attempted but didn't work, because sysfs_proc.o is > > > not re-compiled during link-vmlinux.sh stage. This is required, though, > > > to update embedded BTF data (initially empty data is embedded, then > > > pahole generates BTF info and we need to regenerate sysfs_btf.o with > > > updated contents, but it's too late at that point). > > > > > > If BTF couldn't be generated due to missing or too old pahole, > > > sysfs_btf.c handles that gracefully by detecting that > > > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating > > > /sys/kernel/btf at all. > > > > > > v1->v2: > > > - allow kallsyms stage to re-use vmlinux generated by gen_btf(); > > > > > > Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx> > > > Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> > > > Cc: Jiri Olsa <jolsa@xxxxxxxxxx> > > > Cc: Sam Ravnborg <sam@xxxxxxxxxxxx> > > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > > > --- > > > kernel/bpf/Makefile | 3 +++ > > > kernel/bpf/sysfs_btf.c | 52 ++++++++++++++++++++++++++++++++++++++ > > > scripts/link-vmlinux.sh | 55 +++++++++++++++++++++++++++-------------- > > > 3 files changed, 91 insertions(+), 19 deletions(-) > > > create mode 100644 kernel/bpf/sysfs_btf.c > > First rule, you can't create new sysfs files without a matching > Documentation/ABI/ set of entries. Please do that for the next version > of this patch so we can properly check to see if what you are > documenting lines up with the code. Otherwise we just have to guess as > to what the entries you are creating actually do. Yep, sure, I wasn't aware, will add in v3. > > > > > > > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile > > > index 29d781061cd5..e1d9adb212f9 100644 > > > --- a/kernel/bpf/Makefile > > > +++ b/kernel/bpf/Makefile > > > @@ -22,3 +22,6 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o > > > ifeq ($(CONFIG_INET),y) > > > obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o > > > endif > > > +ifeq ($(CONFIG_SYSFS),y) > > > +obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o > > > +endif > > > diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c > > > new file mode 100644 > > > index 000000000000..ac06ce1d62e8 > > > --- /dev/null > > > +++ b/kernel/bpf/sysfs_btf.c > > > @@ -0,0 +1,52 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +/* > > > + * Provide kernel BTF information for introspection and use by eBPF tools. > > > + */ > > > +#include <linux/kernel.h> > > > +#include <linux/module.h> > > > +#include <linux/kobject.h> > > > +#include <linux/init.h> > > > + > > > +/* See scripts/link-vmlinux.sh, gen_btf() func for details */ > > > +extern char __weak _binary__btf_kernel_bin_start[]; > > > +extern char __weak _binary__btf_kernel_bin_end[]; > > > + > > > +static ssize_t > > > +btf_kernel_read(struct file *file, struct kobject *kobj, > > > + struct bin_attribute *bin_attr, > > > + char *buf, loff_t off, size_t len) > > > +{ > > > + memcpy(buf, _binary__btf_kernel_bin_start + off, len); > > > + return len; > > > +} > > > + > > > +static struct bin_attribute btf_kernel_attr __ro_after_init = { > > > + .attr = { > > > + .name = "kernel", > > > + .mode = 0444, > > > + }, > > > + .read = btf_kernel_read, > > > +}; > > BIN_ATTR_RO()? Ok, will use that. > > > > + > > > +static struct bin_attribute *btf_attrs[] __ro_after_init = { > > > + &btf_kernel_attr, > > > + NULL, > > > +}; > > > + > > > +static struct attribute_group btf_group_attr __ro_after_init = { > > > + .name = "btf", > > > + .bin_attrs = btf_attrs, > > > +}; > > > + > > > +static int __init btf_kernel_init(void) > > > +{ > > > + if (!_binary__btf_kernel_bin_start) > > > + return 0; > > > + > > > + btf_kernel_attr.size = _binary__btf_kernel_bin_end - > > > + _binary__btf_kernel_bin_start; > > > + > > > + return sysfs_create_group(kernel_kobj, &btf_group_attr); > > You are nesting directories here without a "real" kobject in the middle. > Are you _sure_ you want to do that? It's going to get really tricky > later on based on your comments above about creating multiple files in > that directory over time once "modules" are allowed. My thinking was that when we have BTF for modules, I'll need to do some code adjustments anyway, at which point it will be more clear how we want to structure that. But I can add explicit kobject as static variable right now, no problems. Later on we probably will just switch it to be exported, so that modules can self-register/unregister their BTFs autonomously. > > thanks, > > greg k-h