Re: libbpf ringbuf manager starvation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 19, 2021 at 7:51 AM Gilad Reti <gilad.reti@xxxxxxxxx> wrote:
>
> Hello there,
>

Hi,

> When playing with the (relatively) new ringbuf api we encountered
> something that we believe can be an interesting usecase.
> When registering multiple rinbufs to the same ringbuf manager, one of
> which is highly active, other ringbufs may starve. Since libbpf
> (e)polls on all the managed ringbufs at once and then tries to read
> *as many samples as it can* from ready ringbufs, it may get stuck
> indefinitely on one of them, not being able to process the other.
> We know that the current ringbuf api exposes the epoll_fd so that one
> can implement the epoll logic on his own, but this sounds to us like a
> not so advanced usecase that may be worth taking care of specifically.
> Does allowing to specify a maximum number of samples to consume sounds
> like a reasonable addition to the ringbuf api?

Did you actually run into such a situation in practice? If you have a
BPF program producing so much data so fast that user-space can't keep
up, then it sounds like a suboptimal use case for BPF ringbuf.

But nevertheless, my advice for you situation is to use two instances
of libbpf's ring_buffer: one for super-busy ringbuf, and another for
everything else. Or you can even have one for each. It's very
flexible.

As for having this limit, it's not so simple, unfortunately. The
contract between kernel, epoll, and libbpf is that user-space will
always consume all the items until it runs out of more items to
consume. Internally in kernel BPF ringbuf relies on that to skip
unnecessary epoll notifications. If you consume not all items and will
attempt to (e)poll again, you'll never get another notification
(unless you force-notify from your BPF program, that's an advanced use
case).

We could do a round-robin across all registered ringbufs within the
ring_buffer instance in ring_buffer__poll()/ring_buffer__consume(),
but I think it's over-designing for a quite unusual case.


>
> Thanks



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux