On Thu, Apr 13, 2023 at 11:03 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Wed, Apr 12, 2023 at 3:12 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Sat, Apr 8, 2023 at 8:34 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > > > The cloned dynptr will point to the same data as its parent dynptr, > > > with the same type, offset, size and read-only properties. > > > > > > Any writes to a dynptr will be reflected across all instances > > > (by 'instance', this means any dynptrs that point to the same > > > underlying data). > > > > > > Please note that data slice and dynptr invalidations will affect all > > > instances as well. For example, if bpf_dynptr_write() is called on an > > > skb-type dynptr, all data slices of dynptr instances to that skb > > > will be invalidated as well (eg data slices of any clones, parents, > > > grandparents, ...). Another example is if a ringbuf dynptr is submitted, > > > any instance of that dynptr will be invalidated. > > > > > > Changing the view of the dynptr (eg advancing the offset or > > > trimming the size) will only affect that dynptr and not affect any > > > other instances. > > > > > > One example use case where cloning may be helpful is for hashing or > > > iterating through dynptr data. Cloning will allow the user to maintain > > > the original view of the dynptr for future use, while also allowing > > > views to smaller subsets of the data after the offset is advanced or the > > > size is trimmed. > > > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > > --- > > > kernel/bpf/helpers.c | 14 +++++ > > > kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++----- > > > 2 files changed, 126 insertions(+), 13 deletions(-) > > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > index bac4c6fe49f0..108f3bcfa6da 100644 > > > --- a/kernel/bpf/helpers.c > > > +++ b/kernel/bpf/helpers.c > > > @@ -2351,6 +2351,19 @@ __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr) > > > return ptr->offset; > > > } > > > > > > +__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr, > > > + struct bpf_dynptr_kern *clone__uninit) > > > > I think most of uses for bpf_dynptr_clone() would be to get a partial > > view (like you mentioned above, to, e.g., do a hashing of a part of > > the memory range). So it feels it would be best UX if clone would > > allow you to define a new range in one go. So e.g., to create a > > "sub-dynptr" for range of bytes [10, 30), it should be enough to: > > > > struct bpf_dynptr orig_ptr, new_ptr; > > > > bpf_dynptr_clone(&orig_ptr, 10, 30, &new_ptr); > > > > Instead of: > > > > bpf_dynptr_clone(&orig_ptr, &new_ptr); > > bpf_dynptr_advance(&new_ptr, 10); > > bpf_dynptr_trim(&new_ptr, bpf_dynptr_get_size(&orig_ptr) - 30); > > > > I don't think we can do this because we can't have bpf_dynptr_clone() > fail (which might happen if we take in a range, if the range is > invalid). This is because in the case where the clone is of a > reference-counted dynptr (eg like a ringbuf-type dynptr), the clone > even if it's invalid must be treated by the verifier as a legit dynptr > (since the verifier can't know ahead of time whether the clone call > will succeed or not) which means if the invalid clone dynptr is then > passed into a reference-releasing function, the verifier will release > the reference but the actual reference won't be released at runtime > (since the clone dynptr is invalid), which would leak the reference. > An example is something like: > > // invalid range is passed, error is returned and new_ptr is invalid > bpf_dynptr_clone(&ringbuf_ptr, 9999999, 9999999, &new_ptr); > // this releases the reference and invalidates both new_ptr and ringbuf_ptr > bpf_ringbuf_discard_dynptr(&new_ptr, 0); > > At runtime, bpf_ringbuf_discard_dynptr() will be a no-op since new_ptr > is invalid - the ringbuf record still needs to be submitted/discarded, > but the verifier will think this already happened Ah, tricky, good point. But ok, I guess with bpf_dynptr_adjust() proposal in another email this would be ok: bpf_dynptr_clone(..); /* always succeeds */ bpf_dynptr_adjust(&new_ptr, 10, 30); /* could fail to adjust, but dynptr is left valid */ Right? > > > > > This, btw, shows the awkwardness of the bpf_dynptr_trim() approach. > > > > If someone really wanted an exact full-sized copy, it's trivial: > > > > bpf_dynptr_clone(&orig_ptr, 0, bpf_dynptr_get_size(&orig_ptr), &new_ptr); > > > > > > BTW, let's rename bpf_dynptr_get_size -> bpf_dynptr_size()? That > > "get_" is a sore in the eye, IMO. > > Will do! > > > > > > > +{ > > > + if (!ptr->data) { > > > + bpf_dynptr_set_null(clone__uninit); > > > + return -EINVAL; > > > + } > > > + > > > + memcpy(clone__uninit, ptr, sizeof(*clone__uninit)); > > > > why memcpy instead of `*clone__uninit = *ptr`? > > > No good reason :) I'll change this for v2 > > > + > > > + return 0; > > > +} > > > + > > > __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj) > > > { > > > return obj; > > > @@ -2429,6 +2442,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null) > > > BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) > > > BTF_ID_FLAGS(func, bpf_dynptr_get_size) > > > BTF_ID_FLAGS(func, bpf_dynptr_get_offset) > > > +BTF_ID_FLAGS(func, bpf_dynptr_clone) > > > BTF_SET8_END(common_btf_ids) > > > > > > static const struct btf_kfunc_id_set common_kfunc_set = { > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > index 3660b573048a..804cb50050f9 100644 > > > --- a/kernel/bpf/verifier.c > > > +++ b/kernel/bpf/verifier.c > > > @@ -302,6 +302,7 @@ struct bpf_kfunc_call_arg_meta { > > > struct { > > > enum bpf_dynptr_type type; > > > u32 id; > > > + u32 ref_obj_id; > > > } initialized_dynptr; > > > struct { > > > u8 spi; > > > @@ -963,24 +964,15 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_ > > > return 0; > > > } > > > > > > -static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > > > +static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state, int spi) > > > { > > > - struct bpf_func_state *state = func(env, reg); > > > - int spi, i; > > > - > > > - spi = dynptr_get_spi(env, reg); > > > - if (spi < 0) > > > - return spi; > > > + int i; > > > > > > for (i = 0; i < BPF_REG_SIZE; i++) { > > > state->stack[spi].slot_type[i] = STACK_INVALID; > > > state->stack[spi - 1].slot_type[i] = STACK_INVALID; > > > } > > > > > > - /* Invalidate any slices associated with this dynptr */ > > > - if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) > > > - WARN_ON_ONCE(release_reference(env, state->stack[spi].spilled_ptr.ref_obj_id)); > > > - > > > __mark_reg_not_init(env, &state->stack[spi].spilled_ptr); > > > __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr); > > > > > > @@ -1007,6 +999,51 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re > > > */ > > > state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN; > > > state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN; > > > +} > > > + > > > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > > > +{ > > > + struct bpf_func_state *state = func(env, reg); > > > + int spi; > > > + > > > + spi = dynptr_get_spi(env, reg); > > > + if (spi < 0) > > > + return spi; > > > + > > > + if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) { > > > + int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id; > > > + int i; > > > + > > > + /* If the dynptr has a ref_obj_id, then we need to invaldiate > > > > typo: invalidate > > > > > + * two things: > > > + * > > > + * 1) Any dynptrs with a matching ref_obj_id (clones) > > > + * 2) Any slices associated with the ref_obj_id > > > > I think this is where this dynptr_id confusion comes from. The rule > > should be "any slices derived from this dynptr". But as mentioned on > > another thread, it's a separate topic which we can address later. > > > If there's a parent and a clone and slices derived from the parent and > slices derived from the clone, if the clone is invalidated then we > need to invalidate slices associated with the parent as well. So > shouldn't it be "any slices associated with the ref obj id" not "any > slices derived from this dynptr"? (also just a note, parent/clone > slices will share the same ref obj id and the same dynptr id, so > checking against either does the same thing) So, we have a ringbuf dynptr with ref_obj_id=1, id=2, ok? We clone it, clone gets ref_obj_id=1, id=3. If either original dynptr or clone dynptr is released due to bpf_ringbuf_discard_dynptr(), we invalidate all the dynptrs with ref_obj_id=1. During invalidation of each dynptr, we invalidate all the slices with ref_obj_id==dynptr's id. So we'll invalidate slices derived from dynptr with id=2 (original dynptr), and then all the slices derived from dynptr with id=3? > > > > + */ > > > + > > > + /* Invalidate any slices associated with this dynptr */ > > > + WARN_ON_ONCE(release_reference(env, ref_obj_id)); > > > + > > > + /* Invalidate any dynptr clones */ > > > + for (i = 1; i < state->allocated_stack / BPF_REG_SIZE; i++) { > > > + if (state->stack[i].spilled_ptr.ref_obj_id == ref_obj_id) { > > > > nit: invert if condition and continue to reduce nestedness of the rest > > the loop body > > Ooh good idea > > > > > + /* it should always be the case that if the ref obj id > > > + * matches then the stack slot also belongs to a > > > + * dynptr > > > + */ > > > + if (state->stack[i].slot_type[0] != STACK_DYNPTR) { > > > + verbose(env, "verifier internal error: misconfigured ref_obj_id\n"); > > > + return -EFAULT; > > > + } > > > + if (state->stack[i].spilled_ptr.dynptr.first_slot) > > > + invalidate_dynptr(env, state, i); > > > + } > > > + } > > > + > > > + return 0; > > > + } > > > + > > > + invalidate_dynptr(env, state, spi); > > > > > > return 0; > > > } > > > @@ -6967,6 +7004,50 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx, > > > return 0; > > > } > > > > > > +static int handle_dynptr_clone(struct bpf_verifier_env *env, enum bpf_arg_type arg_type, > > > + int regno, int insn_idx, struct bpf_kfunc_call_arg_meta *meta) > > > +{ > > > + struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; > > > + struct bpf_reg_state *first_reg_state, *second_reg_state; > > > + struct bpf_func_state *state = func(env, reg); > > > + enum bpf_dynptr_type dynptr_type = meta->initialized_dynptr.type; > > > + int err, spi, ref_obj_id; > > > + > > > + if (!dynptr_type) { > > > + verbose(env, "verifier internal error: no dynptr type for bpf_dynptr_clone\n"); > > > + return -EFAULT; > > > + } > > > + arg_type |= get_dynptr_type_flag(dynptr_type); > > > > > > what about MEM_RDONLY and MEM_UNINIT flags, do we need to derive and add them? > > I don't think we need MEM_UNINIT because we can't clone an > uninitialized dynptr. But yes, definitely MEM_RDONLY. I missed that, I > will add it in in v2 > > > > > > + > > > + err = process_dynptr_func(env, regno, insn_idx, arg_type); > > > + if (err < 0) > > > + return err; > > > + > > > + spi = dynptr_get_spi(env, reg); > > > + if (spi < 0) > > > + return spi; > > > + > > > + first_reg_state = &state->stack[spi].spilled_ptr; > > > + second_reg_state = &state->stack[spi - 1].spilled_ptr; > > > + ref_obj_id = first_reg_state->ref_obj_id; > > > + > > > + /* reassign the clone the same dynptr id as the original */ > > > + __mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id); > > > + __mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id); > > > > I'm not sure why clone should have the same dynptr_id? Isn't it a new > > instance of a dynptr? I get preserving ref_obj_id (if refcounted), but > > why reusing dynptr_id?.. > > > I think we need to also copy over the dynptr id because in the case of > a non-reference counted dynptr, if the parent (or clone) is > invalidated (eg overwriting bytes of the dynptr on the stack), we must > also invalidate the slices of the clone (or parent) yep, right now we'll have to do that because we have dynptr_id. But if we get rid of it and stick to ref_obj_id and id, then clone would need to get a new id, but keep ref_obj_id, right? > > > > > + > > > + if (meta->initialized_dynptr.ref_obj_id) { > > > + /* release the new ref obj id assigned during process_dynptr_func */ > > > + err = release_reference_state(cur_func(env), ref_obj_id); > > > + if (err) > > > + return err; > > > > ugh... this is not good to create reference just to release. If we > > need to reuse logic, let's reuse the logic without parts that > > shouldn't happen. Please check if we can split process_dynptr_func() > > appropriately to allow this. > > I'll change this for v2. I think the simplest approach would be having > mark_stack_slots_dynptr() take in a boolean or something that'll > indicate whether it should acquire a new reference state or not > > > > > + /* reassign the clone the same ref obj id as the original */ > > > + first_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id; > > > + second_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id; > > > + } > > > + > > > + return 0; > > > +} > > > + > > > > [...]