Re: [QUESTION] usage of BPF_MAP_TYPE_RINGBUF

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 13 Jan 2023 14:56:57 -0800

On Wed, Jan 11, 2023 at 12:27 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Tue, Jan 10, 2023 at 02:49:59PM +0100, andrea terzolo wrote:
> > Hello!
> >
> > If I can I would ask a question regarding the BPF_MAP_TYPE_RINGBUF
> > map. Looking at the kernel implementation [0] it seems that data pages
> > are mapped 2 times to have a more efficient and simpler
> > implementation. This seems to be a ring buffer peculiarity, the perf
> > buffer didn't have such an implementation. In the Falco project [1] we
> > use huge per-CPU buffers to collect almost all the syscalls that the
> > system throws and the default size of each buffer is 8 MB. This means
> > that using the ring buffer approach on a system with 128 CPUs, we will
> > have (128*8*2) MB, while with the perf buffer only (128*8) MB. The
>
> hum IIUC it's not allocated twice but pages are just mapped twice,
> to cope with wrap around samples, described in git log:
>
>     One interesting implementation bit, that significantly simplifies (and thus
>     speeds up as well) implementation of both producers and consumers is how data
>     area is mapped twice contiguously back-to-back in the virtual memory. This
>     allows to not take any special measures for samples that have to wrap around
>     at the end of the circular buffer data area, because the next page after the
>     last data page would be first data page again, and thus the sample will still
>     appear completely contiguous in virtual memory. See comment and a simple ASCII
>     diagram showing this visually in bpf_ringbuf_area_alloc().

yes, exactly, there is no duplication of memory, it's just mapped
twice to make working with records that wrap around simple and
efficient

>
> > issue is that this memory requirement could be too much for some
> > systems and also in Kubernetes environments where there are strict
> > resource limits... Our actual workaround is to use ring buffers shared
> > between more than one CPU with a BPF_MAP_TYPE_ARRAY_OF_MAPS, so for
> > example we allocate a ring buffer for each CPU pair. Unfortunately,
> > this solution has a price since we increase the contention on the ring
> > buffers and as highlighted here [2], the presence of multiple
> > competing writers on the same buffer could become a real bottleneck...
> > Sorry for the long introduction, my question here is, are there any
> > other approaches to manage such a scenario? Will there be a
> > possibility to use the ring buffer without the kernel double mapping
> > in the near future? The ring buffer has such amazing features with
> > respect to the perf buffer, but in a scenario like the Falco one,
> > where we have aggressive multiple producers, this double mapping could
> > become a limitation.
>
> AFAIK the bpf ring buffer can be used across cpus, so you don't need
> to have extra copy for each cpu if you don't really want to
>

seems like they do share, but only between CPUs. But nothing prevents
you from sharing between more than 2 CPUs, right? It's a tradeoff
between contention and overall memory usage (but as pointed out,
ringbuf doesn't use 2x more memory). Do you actually see a lot of
contention when sharing ringbuf between 2 CPUs? There are multiple
applications that share a single ringbuf between all CPUs, and no one
really complained about high contention so far. You'd need to push
tons of data non-stop, probably, at which point I'd worry about
consumers not being able to keep up (and definitely not doing much
useful with all this data). But YMMV, of course.

> jirka
>
> >
> > Thank you in advance for your time,
> > Andrea
> >
> > 0: https://github.com/torvalds/linux/blob/master/kernel/bpf/ringbuf.c#L107
> > 1: https://github.com/falcosecurity/falco
> > 2: https://patchwork.ozlabs.org/project/netdev/patch/20200529075424.3139988-5-andriin@xxxxxx/