Re: Question about bpf perfbuf/ringbuf: pinned in backend with overwriting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2023/12/14 07:35, Andrii Nakryiko wrote:
On Mon, Dec 11, 2023 at 4:39 AM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote:



On 2023/12/9 06:32, Andrii Nakryiko wrote:
On Thu, Dec 7, 2023 at 6:49 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:

On 07/12/2023 13:15, Philo Lu wrote:
Hi all. I have a question when using perfbuf/ringbuf in bpf. I will
appreciate it if you give me any advice.

Imagine a simple case: the bpf program output a log (some tcp
statistics) to user every time a packet is received, and the user
actively read the logs if he wants. I do not want to keep a user process
alive, waiting for outputs of the buffer. User can read the buffer as
need. BTW, the order does not matter.

To conclude, I hope the buffer performs like relayfs: (1) no need for
user process to receive logs, and the user may read at any time (and no
wakeup would be better); (2) old data can be overwritten by new ones.

Currently, it seems that perfbuf and ringbuf cannot satisfy both: (i)
ringbuf: only satisfies (1). However, if data arrive when the buffer is
full, the new data will be lost, until the buffer is consumed. (ii)
perfbuf: only satisfies (2). But user cannot access the buffer after the
process who creates it (including perf_event.rb via mmap) exits.
Specifically, I can use BPF_F_PRESERVE_ELEMS flag to keep the
perf_events, but I do not know how to get the buffer again in a new
process.

In my opinion, this can be solved by either of the following: (a) add
overwrite support in ringbuf (maybe a new flag for reserve), but we have
to address synchronization between kernel and user, especially under
variable data size, because when overwriting occurs, kernel has to
update the consumer posi too; (b) implement map_fd_sys_lookup_elem for
perfbuf to expose fds to user via map_lookup_elem syscall, and a
mechanism is need to preserve perf_event->rb when process exits
(otherwise the buffer will be freed by perf_mmap_close). I am not sure
if they are feasible, and which is better. If not, perhaps we can
develop another mechanism to achieve this?


There was an RFC a while back focused on supporting BPF ringbuf
over-writing [1]; at the time, Andrii noted some potential issues that
might be exposed by doing multiple ringbuf reserves to overfill the
buffer within the same program.


Correct. I don't think it's possible to correctly and safely support
overwriting with BPF ringbuf that has variable-sized elements.

We'll need to implement MPMC ringbuf (probably with fixed sized
element size) to be able to support this.


Thank you very much!

If it is indeed difficult with ringbuf, maybe I can implement a new type
of bpf map based on relay interface [1]? e.g., init relay during map
creating, write into it with bpf helper, and then user can access to it
in filesystem. I think it will be a simple but useful map for
overwritable data transfer.

I don't know much about relay, tbh. Give it a try, I guess.
Alternatively, we need better and faster implementation of
BPF_MAP_TYPE_QUEUE, which seems like the data structure that can
support overwriting and generally be a fixed elementa size
alternative/complement to BPF ringbuf.


Thank you for your reply. I am afraid BPF_MAP_TYPE_QUEUE cannot get rid of locking overheads with concurrent reading and writing by design, and a lockless buffer like relay fits better to our case. So I will try it :)


[1]
https://github.com/torvalds/linux/blob/master/Documentation/filesystems/relay.rst

Alan

[1]
https://lore.kernel.org/lkml/20220906195656.33021-2-flaniel@xxxxxxxxxxxxxxxxxxx/




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux