On Fri, Dec 15, 2023 at 2:10 AM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote: > > > > On 2023/12/14 07:35, Andrii Nakryiko wrote: > > On Mon, Dec 11, 2023 at 4:39 AM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote: > >> > >> > >> > >> On 2023/12/9 06:32, Andrii Nakryiko wrote: > >>> On Thu, Dec 7, 2023 at 6:49 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >>>> > >>>> On 07/12/2023 13:15, Philo Lu wrote: > >>>>> Hi all. I have a question when using perfbuf/ringbuf in bpf. I will > >>>>> appreciate it if you give me any advice. > >>>>> > >>>>> Imagine a simple case: the bpf program output a log (some tcp > >>>>> statistics) to user every time a packet is received, and the user > >>>>> actively read the logs if he wants. I do not want to keep a user process > >>>>> alive, waiting for outputs of the buffer. User can read the buffer as > >>>>> need. BTW, the order does not matter. > >>>>> > >>>>> To conclude, I hope the buffer performs like relayfs: (1) no need for > >>>>> user process to receive logs, and the user may read at any time (and no > >>>>> wakeup would be better); (2) old data can be overwritten by new ones. > >>>>> > >>>>> Currently, it seems that perfbuf and ringbuf cannot satisfy both: (i) > >>>>> ringbuf: only satisfies (1). However, if data arrive when the buffer is > >>>>> full, the new data will be lost, until the buffer is consumed. (ii) > >>>>> perfbuf: only satisfies (2). But user cannot access the buffer after the > >>>>> process who creates it (including perf_event.rb via mmap) exits. > >>>>> Specifically, I can use BPF_F_PRESERVE_ELEMS flag to keep the > >>>>> perf_events, but I do not know how to get the buffer again in a new > >>>>> process. > >>>>> > >>>>> In my opinion, this can be solved by either of the following: (a) add > >>>>> overwrite support in ringbuf (maybe a new flag for reserve), but we have > >>>>> to address synchronization between kernel and user, especially under > >>>>> variable data size, because when overwriting occurs, kernel has to > >>>>> update the consumer posi too; (b) implement map_fd_sys_lookup_elem for > >>>>> perfbuf to expose fds to user via map_lookup_elem syscall, and a > >>>>> mechanism is need to preserve perf_event->rb when process exits > >>>>> (otherwise the buffer will be freed by perf_mmap_close). I am not sure > >>>>> if they are feasible, and which is better. If not, perhaps we can > >>>>> develop another mechanism to achieve this? > >>>>> > >>>> > >>>> There was an RFC a while back focused on supporting BPF ringbuf > >>>> over-writing [1]; at the time, Andrii noted some potential issues that > >>>> might be exposed by doing multiple ringbuf reserves to overfill the > >>>> buffer within the same program. > >>>> > >>> > >>> Correct. I don't think it's possible to correctly and safely support > >>> overwriting with BPF ringbuf that has variable-sized elements. > >>> > >>> We'll need to implement MPMC ringbuf (probably with fixed sized > >>> element size) to be able to support this. > >>> > >> > >> Thank you very much! > >> > >> If it is indeed difficult with ringbuf, maybe I can implement a new type > >> of bpf map based on relay interface [1]? e.g., init relay during map > >> creating, write into it with bpf helper, and then user can access to it > >> in filesystem. I think it will be a simple but useful map for > >> overwritable data transfer. > > > > I don't know much about relay, tbh. Give it a try, I guess. > > Alternatively, we need better and faster implementation of > > BPF_MAP_TYPE_QUEUE, which seems like the data structure that can > > support overwriting and generally be a fixed elementa size > > alternative/complement to BPF ringbuf. > > > > Thank you for your reply. I am afraid BPF_MAP_TYPE_QUEUE cannot get rid > of locking overheads with concurrent reading and writing by design, and I disagree, I think [0] from Dmitry Vyukov is one way to implement lock-free BPF_MAP_TYPE_QUEUE. I don't know how easy it would be to implement overwriting support, but it would be worth considering. [0] https://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue > a lockless buffer like relay fits better to our case. So I will try it :) > > >> > >> [1] > >> https://github.com/torvalds/linux/blob/master/Documentation/filesystems/relay.rst > >> > >>>> Alan > >>>> > >>>> [1] > >>>> https://lore.kernel.org/lkml/20220906195656.33021-2-flaniel@xxxxxxxxxxxxxxxxxxx/