On Mon, May 9, 2022 at 1:35 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Mon, May 9, 2022 at 1:28 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > > > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > > > > > > > Currently, our only way of writing dynamically-sized data into a ring > > > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy > > > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra > > > > > memcpy, but it can only safely support reservation sizes that are > > > > > statically known since the verifier cannot guarantee that the bpf > > > > > program won’t access memory outside the reserved space. > > > > > > > > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer > > > > > reservations without the extra memcpy. > > > > > > > > > > There are 3 new APIs: > > > > > > > > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); > > > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > > > > > > > These closely follow the functionalities of the original ringbuf APIs. > > > > > For example, all ringbuffer dynptrs that have been reserved must be > > > > > either submitted or discarded before the program exits. > > > > > > > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > > > > --- > > > > > > > > Looks great! Modulo those four underscores, they are super confusing... > > > > > > > > > include/linux/bpf.h | 10 ++++- > > > > > include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > > > kernel/bpf/helpers.c | 6 +++ > > > > > kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ > > > > > kernel/bpf/verifier.c | 18 +++++++-- > > > > > tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > > > 6 files changed, 171 insertions(+), 4 deletions(-) > > > > > > > > [...] > > > > > > > > > > > > > [...] > > > > > > > > > + * > > > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) > > > > > + * Description > > > > > + * Discard reserved ring buffer sample through the dynptr > > > > > + * interface. This is a no-op if the dynptr is invalid/null. > > > > > + * > > > > > + * For more information on *flags*, please see > > > > > + * 'bpf_ringbuf_discard'. > > > > > + * Return > > > > > + * Nothing. Always succeeds. > > > > > */ > > > > > > > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not > > > > sure which one is more appropriate, probably just null one), so we can > > > > check in code whether some reservation was successful without knowing > > > > bpf_ringbuf_reserve_dynptr()'s return value > > > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset > > > (convenience helpers). Do you prefer that this be part of this > > > patchset instead? If so, do you think this should be part of the 2nd > > > patch (aka the one where we set up the infra for dynptrs + implement > > > malloc-type dynptrs) or this ringbuf patch or its own patch? > > > > No problem adding it in a follow up patch. > > > > BTW, is it still in the plan to be able to create bpf_dynptr() from > > map_value, global variables, etc? I.e., it's a LOCAL dynptr except > > memory is not on STACK. > > > > Something like > > > > int k = 123; > > struct my_val *v; > > struct bpf_dynptr p; > > > > v = bpf_map_lookup_elem(&my_map, &k); > > if (!v) return 0; > > > > bpf_dynptr_from_mem(&v->my_data, &p); > > > > /* p points inside my_map's value */ > > > > ? > The plan is to still support some types of local dynptrs (eg dynptr to > ctx skbuf / xdp data). If it would be useful to also have this for > map_value, we can add this as well (the RCU protects against the map > value being freed out from under the dynptr, I believe). Yep, I think it's useful and should be pretty straightforward (there are no self-referential issues as with PTR_TO_STACK). > > > > > > > > > > > > > > > [...] > > > > [...]