Re: [PATCH v5 3/4] bpf: Add libbpf logic for user-space ring buffer

David Vernet <void@xxxxxxxxxxxxx> · Mon, 19 Sep 2022 16:00:05 -0500

On Mon, Sep 19, 2022 at 03:22:09PM -0500, David Vernet wrote:

[...]

> > > +       timeout_ns = timeout_ms * ns_per_ms;
> > > +       do {
> > > +               __u64 ns_remaining = timeout_ns - ns_elapsed;
> > > +               int cnt, ms_remaining;
> > > +               void *sample;
> > > +               struct timespec curr;
> > > +
> > > +               sample = user_ring_buffer__reserve(rb, size);
> > > +               if (sample)
> > > +                       return sample;
> > > +               else if (errno != ENOSPC)
> > > +                       return NULL;
> > > +
> > > +               ms_remaining = timeout_ms == -1 ? -1 : ns_remaining / ns_per_ms;
> > 
> > ok, so you've special-cased timeout_ms == -1 but still didn't do
> > max(0, ns_remaining). Can you prove that ns_elapsed will never be
> > bigger than timeout_ns due to various delays in waking up this thread?
> > If not, let's please have max(0) otherwise we can accidentally
> > epoll_wait(-1).
> 
> Yes you're right, this was an oversight. Thanks for catching this!

Wait, actually, this can't happen because of the while check at the end of
the loop:

while (ns_elapsed <= timeout_ns)

So I don't think the max is necessary to prevent underflowing, but I do
think we need to have one more attempt to invoke
user_ring_buffer__reserve() at the end of the function to account for
wakeup delays after epoll_wait() returns. Otherwise, we might think we've
timed out when there's actually a sample left in the buffer. I also still
think your suggestion below for cleaning up makes sense, so I'll still add
it in v6, but I think I can leave off the max() call.

> 
> > > +               /* The kernel guarantees at least one event notification
> > > +                * delivery whenever at least one sample is drained from the
> > > +                * ring buffer in an invocation to bpf_ringbuf_drain(). Other
> > > +                * additional events may be delivered at any time, but only one
> > > +                * event is guaranteed per bpf_ringbuf_drain() invocation,
> > > +                * provided that a sample is drained, and the BPF program did
> > > +                * not pass BPF_RB_NO_WAKEUP to bpf_ringbuf_drain().
> > > +                */
> > > +               cnt = epoll_wait(rb->epoll_fd, &rb->event, 1, ms_remaining);
> > > +               if (cnt < 0)
> > > +                       return NULL;
> > > +
> > > +               if (timeout_ms == -1)
> > > +                       continue;
> > > +
> > > +               err = clock_gettime(CLOCK_MONOTONIC, &curr);
> > > +               if (err)
> > > +                       return NULL;
> > > +
> > > +               ns_elapsed = ns_elapsed_timespec(&start, &curr);
> > 
> > nit: if you move re-calculation of ms_remaining and ns_remaining to
> > here, I think overall loop logic will be even more straightforwad. You
> > can initialize ms_remaining to -1 if timeout_ms < 0 and never
> > recalculate it, right? Note that you can also do ns_elapsed conversion
> > to ms right here and then keep everything else in ms (so no need for
> > timeout_ns, ns_remaining, etc).
> 
> Sounds good, let me give this a shot in v6.
> 
> Thanks for another detailed review!