On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote: > On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote: >> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c. >> There, a BTF record is created for any type containing a spin_lock or >> any next-gen datastructure node/head. >> >> Currently, for non-MAP_VALUE types, reg_btf_record will only search for >> a record using struct_meta_tab if the reg->type exactly matches >> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an >> "allocated obj" type - returned from bpf_obj_new - might pick up other >> flags while working its way through the program. >> > > Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be > passed to helpers. reg_btf_record is used in helpers to inspect the btf_record. > Any other flag combination (the only one possible is PTR_UNTRUSTED right now) > cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED > is to make then unpassable to helpers. > I see what you mean. If reg_btf_record is only used on regs which are args, then the exact match helps enforce PTR_UNTRUSTED not being an acceptable type flag for an arg. Most uses of reg_btf_record seem to be on arg regs, but then we have its use in reg_may_point_to_spin_lock, which is itself used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not sure that it's only used on arg regs currently. Regardless, if the intended use is on arg regs only, it should be renamed to arg_reg_btf_record or similar to make that clear, as current name sounds like it should be applicable to any reg, and thus not enforce constraints particular to arg regs. But I think it's better to leave it general and enforce those constraints elsewhere. For kfuncs this is already happening in check_kfunc_args, where the big switch statements for KF_ARG_* are doing exact type matching. >> Loosen the check to be exact for base_type and just use MEM_ALLOC mask >> for type_flag. >> >> This patch is marked Fixes as the original intent of reg_btf_record was >> unlikely to have been to fail finding btf_record for valid alloc obj >> types with additional flags, some of which (e.g. PTR_UNTRUSTED) >> are valid register type states for alloc obj independent of this series. > > That was the actual intent, same as how check_ptr_to_btf_access uses the exact > reg->type to allow the BPF_WRITE case. > > I think this series is the one introducing this case, passing bpf_rbtree_first's > result to bpf_rbtree_remove, which I think is not possible to make safe in the > first place. We decided to do bpf_list_pop_front instead of bpf_list_entry -> > bpf_list_del due to this exact issue. More in [0]. > > [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@xxxxxxxxxxxxxx > Thanks for the link, I better understand what Alexei meant in his comment on patch 9 of this series. For the helpers added in this series, we can make bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock refs after the rbtree_remove in same manner as they're invalidated after spin_unlock currently. Logic for why this is safe: * If we have two non-owning refs to nodes in a tree, e.g. from bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after, we have no way of knowing if they're aliases of same node. * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree, it might be removing a node that's already been removed, e.g.: n = bpf_obj_new(...); bpf_spin_lock(&lock); bpf_rbtree_add(&tree, &n->node); // n is now non-owning ref to node which was added res = bpf_rbtree_first(); if (!m) {} m = container_of(res, struct node_data, node); // m is now non-owning ref to the same node bpf_rbtree_remove(&tree, &n->node); bpf_rbtree_remove(&tree, &m->node); // BAD bpf_spin_unlock(&lock); * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk of pointing to something that was already removed _only_ after a rbtree_remove, so if we invalidate them all after rbtree_remove they can't be inputs to subsequent remove()s This does conflate current "release non-owning refs because it's not safe to read from them" reasoning with new "release non-owning refs so they can't be passed to remove()". Ideally we could add some new tag to these refs that prevents them from being passed to remove()-type fns, but does allow them to be read, e.g.: n = bpf_obj_new(...); bpf_spin_lock(&lock); bpf_rbtree_add(&tree, &n->node); // n is now non-owning ref to node which was added res = bpf_rbtree_first(); if (!m) {} m = container_of(res, struct node_data, node); // m is now non-owning ref to the same node n = bpf_rbtree_remove(&tree, &n->node); // n is now owning ref again, m is non-owning ref to same node x = m->key; // this should be safe since we're still in CS bpf_rbtree_remove(&tree, &m->node); // But this should be prevented bpf_spin_unlock(&lock); But this would introduce too much addt'l complexity for now IMO. The proposal of just invalidating all non-owning refs prevents both the unsafe second remove() and the safe x = m->key. I will give it a shot, if it doesn't work can change rbtree_remove to rbtree_remove_first w/o node param. But per that linked convo such logic should be tackled eventually, might as well chip away at it now. >> However, I didn't find a specific broken repro case outside of this >> series' added functionality, so it's possible that nothing was >> triggering this logic error before. >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> >> cc: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> >> Fixes: 4e814da0d599 ("bpf: Allow locking bpf_spin_lock in allocated objects") >> --- >> kernel/bpf/verifier.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 1d51bd9596da..67a13110bc22 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -451,6 +451,11 @@ static bool reg_type_not_null(enum bpf_reg_type type) >> type == PTR_TO_SOCK_COMMON; >> } >> >> +static bool type_is_ptr_alloc_obj(u32 type) >> +{ >> + return base_type(type) == PTR_TO_BTF_ID && type_flag(type) & MEM_ALLOC; >> +} >> + >> static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg) >> { >> struct btf_record *rec = NULL; >> @@ -458,7 +463,7 @@ static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg) >> >> if (reg->type == PTR_TO_MAP_VALUE) { >> rec = reg->map_ptr->record; >> - } else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) { >> + } else if (type_is_ptr_alloc_obj(reg->type)) { >> meta = btf_find_struct_meta(reg->btf, reg->btf_id); >> if (meta) >> rec = meta->record; >> -- >> 2.30.2 >>