On Thu, Dec 7, 2023 at 6:49 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > On 07/12/2023 13:15, Philo Lu wrote: > > Hi all. I have a question when using perfbuf/ringbuf in bpf. I will > > appreciate it if you give me any advice. > > > > Imagine a simple case: the bpf program output a log (some tcp > > statistics) to user every time a packet is received, and the user > > actively read the logs if he wants. I do not want to keep a user process > > alive, waiting for outputs of the buffer. User can read the buffer as > > need. BTW, the order does not matter. > > > > To conclude, I hope the buffer performs like relayfs: (1) no need for > > user process to receive logs, and the user may read at any time (and no > > wakeup would be better); (2) old data can be overwritten by new ones. > > > > Currently, it seems that perfbuf and ringbuf cannot satisfy both: (i) > > ringbuf: only satisfies (1). However, if data arrive when the buffer is > > full, the new data will be lost, until the buffer is consumed. (ii) > > perfbuf: only satisfies (2). But user cannot access the buffer after the > > process who creates it (including perf_event.rb via mmap) exits. > > Specifically, I can use BPF_F_PRESERVE_ELEMS flag to keep the > > perf_events, but I do not know how to get the buffer again in a new > > process. > > > > In my opinion, this can be solved by either of the following: (a) add > > overwrite support in ringbuf (maybe a new flag for reserve), but we have > > to address synchronization between kernel and user, especially under > > variable data size, because when overwriting occurs, kernel has to > > update the consumer posi too; (b) implement map_fd_sys_lookup_elem for > > perfbuf to expose fds to user via map_lookup_elem syscall, and a > > mechanism is need to preserve perf_event->rb when process exits > > (otherwise the buffer will be freed by perf_mmap_close). I am not sure > > if they are feasible, and which is better. If not, perhaps we can > > develop another mechanism to achieve this? > > > > There was an RFC a while back focused on supporting BPF ringbuf > over-writing [1]; at the time, Andrii noted some potential issues that > might be exposed by doing multiple ringbuf reserves to overfill the > buffer within the same program. > Correct. I don't think it's possible to correctly and safely support overwriting with BPF ringbuf that has variable-sized elements. We'll need to implement MPMC ringbuf (probably with fixed sized element size) to be able to support this. > Alan > > [1] > https://lore.kernel.org/lkml/20220906195656.33021-2-flaniel@xxxxxxxxxxxxxxxxxxx/