On Tue, Nov 22, 2022 at 9:36 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > On 09/11/2022 22:43, Andrii Nakryiko wrote: > > On Mon, Nov 7, 2022 at 8:37 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >> > >> On 05/11/2022 22:54, Alexei Starovoitov wrote: > >>> On Fri, Nov 4, 2022 at 8:58 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >>>> > >>>> Not all kernel modules can be built in-tree when the core > >>>> kernel is built. This presents a problem for split BTF, because > >>>> split module BTF refers to type ids in the base kernel BTF, and > >>>> if that base kernel BTF changes (even in minor ways) those > >>>> references become invalid. Such modules then cannot take > >>>> advantage of BTF (or at least they only can until the kernel > >>>> changes enough to invalidate their vmlinux type id references). > >>>> This problem has been discussed before, and the initial approach > >>>> was to allow BTF mismatch but fail to load BTF. See [1] > >>>> for more discussion. > >>>> > >>>> Generating standalone BTF for modules helps solve this problem > >>>> because the BTF generated is self-referential only. However, > >>>> tooling is geared towards split BTF - for example bpftool assumes > >>>> a module's BTF is defined relative to vmlinux BTF. To handle > >>>> this, dynamic remapping of standalone BTF is done on module > >>>> load to make it appear like split BTF - type ids and string > >>>> offsets are remapped such that they appear as they would in > >>>> split BTF. It just so happens that the BTF is self-referential. > >>>> With this approach, existing tooling works with standalone > >>>> module BTF from /sys/kernel/btf in the same way as before; > >>>> no knowledge of split versus standalone BTF is required. > >>>> > >>>> Currently, the approach taken is to assume that the BTF > >>>> associated with a module is split BTF. If however the > >>>> checking of types fails, we fall back to interpreting it as > >>>> standalone BTF and carrying out remapping. As discussed in [1] > >>>> there are some heuristics we could use to identify standalone > >>>> versus split module BTF, but for now the simplistic fallback > >>>> method is used. > >>>> > >>>> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx> > >>>> > >>>> [1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/ > >>>> --- > >>>> kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 132 insertions(+) > >>>> > >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c > >>>> index 5579ff3..5efdcaf 100644 > >>>> --- a/kernel/bpf/btf.c > >>>> +++ b/kernel/bpf/btf.c > >>>> @@ -5315,11 +5315,120 @@ struct btf *btf_parse_vmlinux(void) > >>>> > >>>> #ifdef CONFIG_DEBUG_INFO_BTF_MODULES > >>>> > >>>> +static u32 btf_name_off_renumber(struct btf *btf, u32 name_off) > >>>> +{ > >>>> + return name_off + btf->start_str_off; > >>>> +} > >>>> + > >>>> +static u32 btf_id_renumber(struct btf *btf, u32 id) > >>>> +{ > >>>> + /* no need to renumber void */ > >>>> + if (id == 0) > >>>> + return id; > >>>> + return id + btf->start_id - 1; > >>>> +} > >>>> + > >>>> +/* Renumber standalone BTF to appear as split BTF; name offsets must > >>>> + * be relative to btf->start_str_offset and ids relative to btf->start_id. > >>>> + * When user sees BTF it will appear as normal module split BTF, the only > >>>> + * difference being it is fully self-referential and does not refer back > >>>> + * to vmlinux BTF (aside from 0 "void" references). > >>>> + */ > >>>> +static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t) > >>>> +{ > >>>> + struct btf_var_secinfo *secinfo; > >>>> + struct btf *btf = env->btf; > >>>> + struct btf_member *member; > >>>> + struct btf_param *param; > >>>> + struct btf_array *array; > >>>> + struct btf_enum64 *e64; > >>>> + struct btf_enum *e; > >>>> + int i; > >>>> + > >>>> + t->name_off = btf_name_off_renumber(btf, t->name_off); > >>>> + > >>>> + switch (BTF_INFO_KIND(t->info)) { > >>>> + case BTF_KIND_INT: > >>>> + case BTF_KIND_FLOAT: > >>>> + case BTF_KIND_TYPE_TAG: > >>>> + /* nothing to renumber here, no type references */ > >>>> + break; > >>>> + case BTF_KIND_PTR: > >>>> + case BTF_KIND_FWD: > >>>> + case BTF_KIND_TYPEDEF: > >>>> + case BTF_KIND_VOLATILE: > >>>> + case BTF_KIND_CONST: > >>>> + case BTF_KIND_RESTRICT: > >>>> + case BTF_KIND_FUNC: > >>>> + case BTF_KIND_VAR: > >>>> + case BTF_KIND_DECL_TAG: > >>>> + /* renumber the referenced type */ > >>>> + t->type = btf_id_renumber(btf, t->type); > >>>> + break; > >>>> + case BTF_KIND_ARRAY: > >>>> + array = btf_array(t); > >>>> + array->type = btf_id_renumber(btf, array->type); > >>>> + array->index_type = btf_id_renumber(btf, array->index_type); > >>>> + break; > >>>> + case BTF_KIND_STRUCT: > >>>> + case BTF_KIND_UNION: > >>>> + member = (struct btf_member *)(t + 1); > >>>> + for (i = 0; i < btf_type_vlen(t); i++) { > >>>> + member->type = btf_id_renumber(btf, member->type); > >>>> + member->name_off = btf_name_off_renumber(btf, member->name_off); > >>>> + member++; > >>>> + } > >>>> + break; > >>>> + case BTF_KIND_FUNC_PROTO: > >>>> + param = (struct btf_param *)(t + 1); > >>>> + for (i = 0; i < btf_type_vlen(t); i++) { > >>>> + param->type = btf_id_renumber(btf, param->type); > >>>> + param->name_off = btf_name_off_renumber(btf, param->name_off); > >>>> + param++; > >>>> + } > >>>> + break; > >>>> + case BTF_KIND_DATASEC: > >>>> + secinfo = (struct btf_var_secinfo *)(t + 1); > >>>> + for (i = 0; i < btf_type_vlen(t); i++) { > >>>> + secinfo->type = btf_id_renumber(btf, secinfo->type); > >>>> + secinfo++; > >>>> + } > >>>> + break; > >>>> + case BTF_KIND_ENUM: > >>>> + e = (struct btf_enum *)(t + 1); > >>>> + for (i = 0; i < btf_type_vlen(t); i++) { > >>>> + e->name_off = btf_name_off_renumber(btf, e->name_off); > >>>> + e++; > >>>> + } > >>>> + break; > >>>> + case BTF_KIND_ENUM64: > >>>> + e64 = (struct btf_enum64 *)(t + 1); > >>>> + for (i = 0; i < btf_type_vlen(t); i++) { > >>>> + e64->name_off = btf_name_off_renumber(btf, e64->name_off); > >>>> + e64++; > >>>> + } > >>>> + break; > >>>> + } > >>>> +} > >>>> + > >>>> +static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf) > >>>> +{ > >>>> + struct btf *btf = env->btf; > >>>> + int i; > >>>> + > >>>> + btf->start_id = base_btf->nr_types; > >>>> + btf->start_str_off = base_btf->hdr.str_len; > >>>> + > >>>> + for (i = 0; i < btf->nr_types; i++) > >>>> + btf_type_renumber(env, btf->types[i]); > >>>> +} > >>>> + > >>>> static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size) > >>>> { > >>>> struct btf_verifier_env *env = NULL; > >>>> struct bpf_verifier_log *log; > >>>> struct btf *btf = NULL, *base_btf; > >>>> + bool standalone = false; > >>>> int err; > >>>> > >>>> base_btf = bpf_get_btf_vmlinux(); > >>>> @@ -5367,9 +5476,32 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u > >>>> goto errout; > >>>> > >>>> err = btf_check_all_metas(env); > >>>> + if (err) { > >>>> + /* BTF may be standalone; in that case meta checks will > >>>> + * fail and we fall back to standalone BTF processing. > >>>> + * Later on, once we have checked all metas, we will > >>>> + * retain start id from base BTF so it will look like > >>>> + * split BTF (but is self-contained); renumbering is done > >>>> + * also to give the split BTF-like appearance and not > >>>> + * confuse pahole which assumes split BTF for modules. > >>>> + */ > >>>> + btf->base_btf = NULL; > >>>> + if (btf->types) > >>>> + kvfree(btf->types); > >>>> + btf->types = NULL; > >>>> + btf->types_size = 0; > >>>> + btf->start_id = 0; > >>>> + btf->nr_types = 0; > >>>> + btf->start_str_off = 0; > >>>> + standalone = true; > >>>> + err = btf_check_all_metas(env); > >>>> + } > >>> > >>> Interesting idea! > >>> Instead of failing the first time, how about we make > >>> such standalone module BTFs explicit? > >>> Some flag or special type? > >>> Then the kernel just checks that and renumbers right away. > >>> > >> > >> I was thinking that might be one way to do it, perhaps even > >> a .BTF_standalone section name or somesuch as a signal we > >> are dealing with standalone BTF. However I _think_ > >> we can actually determine the module BTF is standalone > >> without needing to change anything in the toolchain (I > >> think adding flags would require that). > > > > Why not just extend btf_header to contains extra information where we > > can record whether it is stand-alone or split, what's the checksum of > > base BTF, etc, etc. Yes, we'll need to teach libbpf and some tools > > about this v2 of btf_header, but it's also an opportunity to make BTF > > a bit more self-describing. E.g., right now there is a pretty big > > problem that when we add new BTF_KIND_XXX, no existing tooling will be > > able to do anything with BTF that contains that new kind, even if that > > kind is completely optional and uninteresting for most tools (e.g., if > > some particular tool didn't care about DECL_TAG). So with v2 we can > > record a small table that records each kind's size: extra info size > > and per-element size (for types that have vlen>0). > > > > More upfront work, but solves few existing problems and we can reserve > > space for future fields as well. > > > > WDYT? > > > > Sorry Andrii, missed this reply. With respect to self-describing BTF, > I've got a patch series that I was planning on sending out soon which > approaches this by describing BTF kind encodings in BTF. The handy thing > about that is as you say it allows us to parse BTF even if we don't > actually use features, so when new kinds are added we can skip past > them, but if a new libbpf comes along we can potentially unlock these > features. This is particularly valuable for kernel/module BTF since > the kernel might be around for a while and we would rather encode > BTF optimistically. The other useful thing is it won't itself require > any BTF format changes; it's simply a matter of adding a libbpf call > to pahole which says "encode BTF kinds", and that part is done using > basic BTF kinds like typedefs+structs. The benefit of doing this in > BTF is that we don't need to worry about header incompatibility etc. > Just adds a few hundred bytes to overall kernel BTF too, since kind > encodings are only needed for vmlinux (or standalone) BTF. We also > end up with BTF kind structures in vmlinux.h, which could potentially > simplify BTF introspection in BPF programs in the future also. > > The handy thing about this approach is also that the kernel code > that parses the BTF kind descriptions is easy to backport, so we > could even look at backporting that patch to stable kernels such > that they would be an a position to parse newer BTF kinds even > if they could not use them. Because that parsing is based around > interpreting existing BTF kinds, it is minimally invasive. > > I'll send it out as an RFC shortly to provide additional context. I'm not clear how that handles kinds with multiple entities (like fields for STRUCTs, args for FUNC_PROTO, etc), but I'll look at RFC. I think a small table of kind ID -> prefix size + per-element size mappings is useful for any tool that works with BTF, because it allows you to skip unknown stuff. And it works not just for vmlinux BTF. > > Alan