Re: Question about bpf perfbuf/ringbuf: pinned in backend with overwriting

Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> · Tue, 19 Dec 2023 14:23:59 +0800

On Wed, Dec 13, 2023 at 03:35:19PM -0800, Andrii Nakryiko wrote:
> On Mon, Dec 11, 2023 at 4:39 AM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote:
> [...]
> > >>> Imagine a simple case: the bpf program output a log (some tcp
> > >>> statistics) to user every time a packet is received, and the user
> > >>> actively read the logs if he wants. I do not want to keep a user process
> > >>> alive, waiting for outputs of the buffer. User can read the buffer as
> > >>> need. BTW, the order does not matter.

Not sure if it's the same usecase, but I'd imagine for debugging
hard-to-reproduce issue where little is known (thus minimal filtering is
applied and the volume of event is large), this would be quite useful.
You just want to gather as much details as possible for events that
happens just before the issue occurs, and don't care about events that
happended much earlier.

> > >>> To conclude, I hope the buffer performs like relayfs: (1) no need for
> > >>> user process to receive logs, and the user may read at any time (and no
> > >>> wakeup would be better); (2) old data can be overwritten by new ones.
> > >>> 
> > >>> Currently, it seems that perfbuf and ringbuf cannot satisfy both: (i)
> > >>> ringbuf: only satisfies (1). However, if data arrive when the buffer is
> > >>> full, the new data will be lost, until the buffer is consumed. (ii)
> > >>> perfbuf: only satisfies (2). But user cannot access the buffer after the
> > >>> process who creates it (including perf_event.rb via mmap) exits.
> > >>> Specifically, I can use BPF_F_PRESERVE_ELEMS flag to keep the
> > >>> perf_events, but I do not know how to get the buffer again in a new
> > >>> process.
> > 
> > [...]
> > 
> > If it is indeed difficult with ringbuf, maybe I can implement a new type
> > of bpf map based on relay interface [1]? e.g., init relay during map
> > creating, write into it with bpf helper, and then user can access to it
> > in filesystem. I think it will be a simple but useful map for
> > overwritable data transfer.
> 
> I don't know much about relay, tbh. Give it a try, I guess.
> Alternatively, we need better and faster implementation of
> BPF_MAP_TYPE_QUEUE, which seems like the data structure that can
> support overwriting and generally be a fixed elementa size
> alternative/complement to BPF ringbuf.

Curious whether it is possible to reuse ftrace's trace buffer instead
(or it's underlying ring buffer implementation at
kernel/trace/ring_buffer.c). AFAICT it satisfies both requirements that
Philo stated: (1) no need for user process as the buffer is accessible
through tracefs, and (2) has an overwrite mode.

Further more, a natural feature request that would come after
overwriting support would be snapshotting, and that has already been
covered in ftrace.

Note: technically BPF program could already write to ftrace's trace
buffer with the bpf_trace_vprintk() helper, but that goes through string
formatting and only allows writing into to the global buffer.