On Wed, Dec 07, 2022 at 01:34:44PM -0500, Dave Marchevsky wrote: > On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote: > > On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote: > >> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c. > >> There, a BTF record is created for any type containing a spin_lock or > >> any next-gen datastructure node/head. > >> > >> Currently, for non-MAP_VALUE types, reg_btf_record will only search for > >> a record using struct_meta_tab if the reg->type exactly matches > >> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an > >> "allocated obj" type - returned from bpf_obj_new - might pick up other > >> flags while working its way through the program. > >> > > > > Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be > > passed to helpers. reg_btf_record is used in helpers to inspect the btf_record. > > Any other flag combination (the only one possible is PTR_UNTRUSTED right now) > > cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED > > is to make then unpassable to helpers. > > > > I see what you mean. If reg_btf_record is only used on regs which are args, > then the exact match helps enforce PTR_UNTRUSTED not being an acceptable > type flag for an arg. Most uses of reg_btf_record seem to be on arg regs, > but then we have its use in reg_may_point_to_spin_lock, which is itself > used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not > sure that it's only used on arg regs currently. > > Regardless, if the intended use is on arg regs only, it should be renamed to > arg_reg_btf_record or similar to make that clear, as current name sounds like > it should be applicable to any reg, and thus not enforce constraints particular > to arg regs. > > But I think it's better to leave it general and enforce those constraints > elsewhere. For kfuncs this is already happening in check_kfunc_args, where the > big switch statements for KF_ARG_* are doing exact type matching. > > >> Loosen the check to be exact for base_type and just use MEM_ALLOC mask > >> for type_flag. > >> > >> This patch is marked Fixes as the original intent of reg_btf_record was > >> unlikely to have been to fail finding btf_record for valid alloc obj > >> types with additional flags, some of which (e.g. PTR_UNTRUSTED) > >> are valid register type states for alloc obj independent of this series. > > > > That was the actual intent, same as how check_ptr_to_btf_access uses the exact > > reg->type to allow the BPF_WRITE case. > > > > I think this series is the one introducing this case, passing bpf_rbtree_first's > > result to bpf_rbtree_remove, which I think is not possible to make safe in the > > first place. We decided to do bpf_list_pop_front instead of bpf_list_entry -> > > bpf_list_del due to this exact issue. More in [0]. > > > > [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@xxxxxxxxxxxxxx > > > > Thanks for the link, I better understand what Alexei meant in his comment on > patch 9 of this series. For the helpers added in this series, we can make > bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock > refs after the rbtree_remove in same manner as they're invalidated after > spin_unlock currently. > > Logic for why this is safe: > > * If we have two non-owning refs to nodes in a tree, e.g. from > bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after, > we have no way of knowing if they're aliases of same node. > > * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree, > it might be removing a node that's already been removed, e.g.: > > n = bpf_obj_new(...); > bpf_spin_lock(&lock); > > bpf_rbtree_add(&tree, &n->node); > // n is now non-owning ref to node which was added > res = bpf_rbtree_first(); > if (!m) {} > m = container_of(res, struct node_data, node); > // m is now non-owning ref to the same node > bpf_rbtree_remove(&tree, &n->node); > bpf_rbtree_remove(&tree, &m->node); // BAD Let me clarify my previous email: Above doesn't have to be 'BAD'. Instead of if (WARN_ON_ONCE(RB_EMPTY_NODE(n))) we can drop WARN and simply return. If node is not part of the tree -> nop. Same for bpf_rbtree_add. If it's already added -> nop. Then we can have bpf_rbtree_first() returning PTR_TRUSTED with acquire semantics. We do all these checks under the same rbtree root lock, so it's safe. > bpf_spin_unlock(&lock); > > * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk > of pointing to something that was already removed _only_ after a > rbtree_remove, so if we invalidate them all after rbtree_remove they can't > be inputs to subsequent remove()s With above proposed run-time checks both bpf_rbtree_remove and bpf_rbtree_add can have release semantics. No need for special release_on_unlock hacks. > This does conflate current "release non-owning refs because it's not safe to > read from them" reasoning with new "release non-owning refs so they can't be > passed to remove()". Ideally we could add some new tag to these refs that > prevents them from being passed to remove()-type fns, but does allow them to > be read, e.g.: > > n = bpf_obj_new(...); 'n' is acquired. > bpf_spin_lock(&lock); > > bpf_rbtree_add(&tree, &n->node); > // n is now non-owning ref to node which was added since bpf_rbtree_add does release on 'n'... > res = bpf_rbtree_first(); > if (!m) {} > m = container_of(res, struct node_data, node); > // m is now non-owning ref to the same node ... below is not allowed by the verifier. > n = bpf_rbtree_remove(&tree, &n->node); I'm not sure what's an idea to return 'n' from remove... Maybe it should be simple bool ? > // n is now owning ref again, m is non-owning ref to same node > x = m->key; // this should be safe since we're still in CS below works because 'm' cames from bpf_rbtree_first that acquired 'res'. > bpf_rbtree_remove(&tree, &m->node); // But this should be prevented > > bpf_spin_unlock(&lock); >