On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > Currently, our only way of writing dynamically-sized data into a ring > buffer is through bpf_ringbuf_output but this incurs an extra memcpy > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra > memcpy, but it can only safely support reservation sizes that are > statically known since the verifier cannot guarantee that the bpf > program won’t access memory outside the reserved space. > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer > reservations without the extra memcpy. > > There are 3 new APIs: > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); > > These closely follow the functionalities of the original ringbuf APIs. > For example, all ringbuffer dynptrs that have been reserved must be > either submitted or discarded before the program exits. > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > --- Looks great! Modulo those four underscores, they are super confusing... > include/linux/bpf.h | 10 ++++- > include/uapi/linux/bpf.h | 35 +++++++++++++++++ > kernel/bpf/helpers.c | 6 +++ > kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ > kernel/bpf/verifier.c | 18 +++++++-- > tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ > 6 files changed, 171 insertions(+), 4 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 757440406962..10efbec99e93 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -394,7 +394,10 @@ enum bpf_type_flag { > /* DYNPTR points to dynamically allocated memory. */ > DYNPTR_TYPE_MALLOC = BIT(8 + BPF_BASE_TYPE_BITS), > > - __BPF_TYPE_LAST_FLAG = DYNPTR_TYPE_MALLOC, > + /* DYNPTR points to a ringbuf record. */ > + DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), > + > + __BPF_TYPE_LAST_FLAG = DYNPTR_TYPE_RINGBUF, it's getting a bit old to have to update __BPF_TYPE_LAST_FLAG all the time, maybe let's do this: __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, and never touch it again? > }; > [...] > + * > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) > + * Description > + * Discard reserved ring buffer sample through the dynptr > + * interface. This is a no-op if the dynptr is invalid/null. > + * > + * For more information on *flags*, please see > + * 'bpf_ringbuf_discard'. > + * Return > + * Nothing. Always succeeds. > */ let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not sure which one is more appropriate, probably just null one), so we can check in code whether some reservation was successful without knowing bpf_ringbuf_reserve_dynptr()'s return value > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ [...] > +BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags, > + struct bpf_dynptr_kern *, ptr) > +{ > + void *sample; > + int err; > + > + err = bpf_dynptr_check_size(size); > + if (err) { > + bpf_dynptr_set_null(ptr); > + return err; > + } > + > + sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags); I was so confused by these four underscored for a bit... Is this what's defined inside BPF_CALL_4 (and thus makes it ungreppable). Can you instead just open-code container_of and __bpf_ringbuf_reserve() directly to make it a bit easier to follow? And flags check as well. It will so much easier to understand what's going on. > + > + if (!sample) { > + bpf_dynptr_set_null(ptr); > + return -EINVAL; > + } > + > + bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size); > + > + return 0; > +} > + > +const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = { > + .func = bpf_ringbuf_reserve_dynptr, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_CONST_MAP_PTR, > + .arg2_type = ARG_ANYTHING, > + .arg3_type = ARG_ANYTHING, > + .arg4_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT, > +}; > + > +BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags) > +{ > + if (!ptr->data) > + return 0; > + > + ____bpf_ringbuf_submit(ptr->data, flags); this just calls bpf_ringbuf_commit(), let's do it here explicitly as well > + > + bpf_dynptr_set_null(ptr); > + > + return 0; > +} > + > +const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = { > + .func = bpf_ringbuf_submit_dynptr, > + .ret_type = RET_VOID, > + .arg1_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE, > + .arg2_type = ARG_ANYTHING, > +}; > + > +BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags) > +{ > + if (!ptr->data) > + return 0; > + > + ____bpf_ringbuf_discard(ptr->data, flags); > + ditto > + bpf_dynptr_set_null(ptr); > + > + return 0; > +} > + [...]