On 05/14, Andrii Nakryiko wrote:
On Thu, May 14, 2020 at 10:33 AM <sdf@xxxxxxxxxx> wrote:
>
> On 05/13, Andrii Nakryiko wrote:
[...]
> > + * void bpf_ringbuf_submit(void *data)
> > + * Description
> > + * Submit reserved ring buffer sample, pointed to by
*data*.
> > + * Return
> > + * Nothing.
> Even though you mention self-pacing properties, would it still
> make sense to add some argument to bpf_ringbuf_submit/bpf_ringbuf_output
> to indicate whether to wake up userspace or not? Maybe something like
> a threshold of number of outstanding events in the ringbuf after which
> we do the wakeup? The default 0/1 preserve the existing behavior.
>
> The example I can give is a control plane userspace thread that
> once a second aggregates the events, it doesn't care about millisecond
> resolution. With the current scheme, I suppose, if BPF generates events
> every 1ms, the userspace will be woken up 1000 times (if it can keep
> up). Most of the time, we don't really care and some buffering
> properties are desired.
perf buffer has setting like this, and believe me, it's so confusing
and dangerous, that I wouldn't want this to be exposed. Even though I
was aware of this behavior, I still had to debug and work-around this
lack on wakeup few times, it's really-really confusing feature.
In your case, though, why wouldn't user-space poll data just once a
second, if it's not interested in getting data as fast as possible?
If I poll once per second I might lose the events if, for some reason,
there is a spike. I really want to have something like: "wakeup
userspace if the ringbuffer fill is over some threshold or
the last wakeup was too long ago". We currently do this via a percpu
cache map. IIRC, you've shared on lsfmmbpf that you do something like
that as well.
So I was thinking how I can use new ringbuff to remove the unneeded
copies and help with the reordering, but I'm a bit concerned about
regressing on the number of wakeups.
Maybe having a flag like RINGBUF_NO_WAKEUP in bpf_ringbuf_submit()
will suffice? And if there is a helper or some way to obtain a
number of unconsumed items, I can implement my own flushing policy.