On Tue, Aug 30, 2022 at 6:42 AM David Vernet <void@xxxxxxxxxxxxx> wrote: > > On Wed, Aug 24, 2022 at 02:58:31PM -0700, Andrii Nakryiko wrote: > > [...] > > > > +LIBBPF_API struct user_ring_buffer * > > > +user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); > > > +LIBBPF_API void *user_ring_buffer__reserve(struct user_ring_buffer *rb, > > > + __u32 size); > > > + > > > +LIBBPF_API void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, > > > + __u32 size, > > > + int timeout_ms); > > > +LIBBPF_API void user_ring_buffer__submit(struct user_ring_buffer *rb, > > > + void *sample); > > > +LIBBPF_API void user_ring_buffer__discard(struct user_ring_buffer *rb, > > > + void *sample); > > > +LIBBPF_API void user_ring_buffer__free(struct user_ring_buffer *rb); > > > + [...] > > > +void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms) > > > +{ > > > + int ms_elapsed = 0, err; > > > + struct timespec start; > > > + > > > + if (timeout_ms < 0 && timeout_ms != -1) > > > + return errno = EINVAL, NULL; > > > + > > > + if (timeout_ms != -1) { > > > + err = clock_gettime(CLOCK_MONOTONIC, &start); > > > + if (err) > > > + return NULL; > > > + } > > > + > > > + do { > > > + int cnt, ms_remaining = timeout_ms - ms_elapsed; > > > > let's max(0, timeout_ms - ms_elapsed) to avoid negative ms_remaining > > in some edge timing cases > > We actually want to have a negative ms_remaining if timeout_ms is -1. -1 > in epoll_wait() specifies an infinite timeout. If we were to round up to > 0, it wouldn't block at all. then I think it's better to special case timeout_ms == -1. My worry here as I mentioned is edge case timing where ms_elapsed is bigger than our remaining timeout_ms and we go into <0 and stay blocked for long time. So I think it's best to pass `timeout_ms < 0 ? -1 : ms_remaining` and still do max. But I haven't checked v5 yet, so if you already addressed this, it's fine. > > > > + void *sample; > > > + struct timespec curr; > > > + > > > + sample = user_ring_buffer__reserve(rb, size); > > > + if (sample) > > > + return sample; > > > + else if (errno != ENODATA) > > > + return NULL; > > > + > > > + /* The kernel guarantees at least one event notification > > > + * delivery whenever at least one sample is drained from the > > > + * ringbuffer in an invocation to bpf_ringbuf_drain(). Other > > > + * additional events may be delivered at any time, but only one > > > + * event is guaranteed per bpf_ringbuf_drain() invocation, > > > + * provided that a sample is drained, and the BPF program did > > > + * not pass BPF_RB_NO_WAKEUP to bpf_ringbuf_drain(). > > > + */ > > > + cnt = epoll_wait(rb->epoll_fd, &rb->event, 1, ms_remaining); > > > + if (cnt < 0) > > > + return NULL; > > > + > > > + if (timeout_ms == -1) > > > + continue; > > > + > > > + err = clock_gettime(CLOCK_MONOTONIC, &curr); > > > + if (err) > > > + return NULL; > > > + > > > + ms_elapsed = ms_elapsed_timespec(&start, &curr); > > > + } while (ms_elapsed <= timeout_ms); > > > > let's simplify all the time keeping to use nanosecond timestamps and > > only convert to ms when calling epoll_wait()? Then you can just have a > > tiny helper to convert timespec to nanosecond ts ((u64)ts.tv_sec * > > 1000000000 + ts.tv_nsec) and compare u64s directly. WDYT? > > Sounds like an improvement to me! > > Thanks, > David