On Thu, May 14, 2020 at 2:13 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, May 14, 2020 at 1:53 PM <sdf@xxxxxxxxxx> wrote: > > > > On 05/14, Andrii Nakryiko wrote: > > > On Thu, May 14, 2020 at 10:33 AM <sdf@xxxxxxxxxx> wrote: > > > > > > > > On 05/13, Andrii Nakryiko wrote: > > > > [...] > > > > > > > + * void bpf_ringbuf_submit(void *data) > > > > > + * Description > > > > > + * Submit reserved ring buffer sample, pointed to by > > > *data*. > > > > > + * Return > > > > > + * Nothing. > > > > Even though you mention self-pacing properties, would it still > > > > make sense to add some argument to bpf_ringbuf_submit/bpf_ringbuf_output > > > > to indicate whether to wake up userspace or not? Maybe something like > > > > a threshold of number of outstanding events in the ringbuf after which > > > > we do the wakeup? The default 0/1 preserve the existing behavior. > > > > > > > > The example I can give is a control plane userspace thread that > > > > once a second aggregates the events, it doesn't care about millisecond > > > > resolution. With the current scheme, I suppose, if BPF generates events > > > > every 1ms, the userspace will be woken up 1000 times (if it can keep > > > > up). Most of the time, we don't really care and some buffering > > > > properties are desired. > > > > > perf buffer has setting like this, and believe me, it's so confusing > > > and dangerous, that I wouldn't want this to be exposed. Even though I > > > was aware of this behavior, I still had to debug and work-around this > > > lack on wakeup few times, it's really-really confusing feature. > > > > > In your case, though, why wouldn't user-space poll data just once a > > > second, if it's not interested in getting data as fast as possible? > > If I poll once per second I might lose the events if, for some reason, > > there is a spike. I really want to have something like: "wakeup > > userspace if the ringbuffer fill is over some threshold or > > the last wakeup was too long ago". We currently do this via a percpu > > cache map. IIRC, you've shared on lsfmmbpf that you do something like > > that as well. > > Hm... don't remember such use case on our side. All applications I > know of use default perf_buffer settings with no sampling. Nevermind, I might have misunderstood :-) > > So I was thinking how I can use new ringbuff to remove the unneeded > > copies and help with the reordering, but I'm a bit concerned about > > regressing on the number of wakeups. > > > > Maybe having a flag like RINGBUF_NO_WAKEUP in bpf_ringbuf_submit() > > will suffice? And if there is a helper or some way to obtain a > > number of unconsumed items, I can implement my own flushing policy. > > Ok, I guess giving application control at each discard/commit makes > for ultimate flexibility. Let me add flags argument to commit/discard > and allow to specify NO_WAKEUP flag. As for count of unconsumed events > -- that would be a bit expensive to maintain. How about amount of data > that's not consumed? It's obviously going to be racy, but returning > (producer_pos - consumer_pos) should be sufficient enough for such > smart and best-effort heuristics? WDYT? Awesome, SGTM! Racy is fine (I don't see how we can make it non-racy as well). The amount of data instead of the number of items is also fine since I know the size of the buffer.