On Mon, Aug 8, 2022 at 8:54 AM David Vernet <void@xxxxxxxxxxxxx> wrote: > > We want to support a ringbuf map type where samples are published from > user-space to BPF programs. BPF currently supports a kernel -> user-space > circular ringbuffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to > define a new map type for user-space -> kernel, as none of the helpers > exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer > ringbuffer, and we'll want to add one or more helper functions that would > not apply for a kernel-producer ringbuffer. > > This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type > definition. The map type is useless in its current form, as there is no way > to access or use it for anything until we add more BPF helpers. A follow-on > patch will therefore add a new helper function that allows BPF programs to > run callbacks on samples that are published to the ringbuffer. > > Signed-off-by: David Vernet <void@xxxxxxxxxxxxx> > --- > include/linux/bpf_types.h | 1 + > include/uapi/linux/bpf.h | 1 + > kernel/bpf/ringbuf.c | 70 +++++++++++++++++++++++++++++----- > kernel/bpf/verifier.c | 3 ++ > tools/include/uapi/linux/bpf.h | 1 + > tools/lib/bpf/libbpf.c | 1 + > 6 files changed, 68 insertions(+), 9 deletions(-) > > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h > index 2b9112b80171..2c6a4f2562a7 100644 > --- a/include/linux/bpf_types.h > +++ b/include/linux/bpf_types.h > @@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) > #endif > BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) > BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) > +BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops) > > BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) > BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing) > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 7bf9ba1329be..a341f877b230 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -909,6 +909,7 @@ enum bpf_map_type { > BPF_MAP_TYPE_INODE_STORAGE, > BPF_MAP_TYPE_TASK_STORAGE, > BPF_MAP_TYPE_BLOOM_FILTER, > + BPF_MAP_TYPE_USER_RINGBUF, > }; > > /* Note that tracing related programs such as > diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c > index ded4faeca192..29e2de42df15 100644 > --- a/kernel/bpf/ringbuf.c > +++ b/kernel/bpf/ringbuf.c > @@ -38,12 +38,32 @@ struct bpf_ringbuf { > struct page **pages; > int nr_pages; > spinlock_t spinlock ____cacheline_aligned_in_smp; > - /* Consumer and producer counters are put into separate pages to allow > - * mapping consumer page as r/w, but restrict producer page to r/o. > - * This protects producer position from being modified by user-space > - * application and ruining in-kernel position tracking. > + /* Consumer and producer counters are put into separate pages to > + * allow each position to be mapped with different permissions. > + * This prevents a user-space application from modifying the > + * position and ruining in-kernel tracking. The permissions of the > + * pages depend on who is producing samples: user-space or the > + * kernel. > + * > + * Kernel-producer > + * --------------- > + * The producer position and data pages are mapped as r/o in > + * userspace. For this approach, bits in the header of samples are > + * used to signal to user-space, and to other producers, whether a > + * sample is currently being written. > + * > + * User-space producer > + * ------------------- > + * Only the page containing the consumer position, and whether the > + * ringbuffer is currently being consumed via a 'busy' bit, are > + * mapped r/o in user-space. Sample headers may not be used to > + * communicate any information between kernel consumers, as a > + * user-space application could modify its contents at any time. > */ > - unsigned long consumer_pos __aligned(PAGE_SIZE); > + struct { > + unsigned long consumer_pos; > + atomic_t busy; one more thing, why does busy have to be exposed into user-space mapped memory at all? Can't it be just a private variable in bpf_ringbuf? > + } __aligned(PAGE_SIZE); > unsigned long producer_pos __aligned(PAGE_SIZE); > char data[] __aligned(PAGE_SIZE); > }; [...]