Re: [PATCH] libbpf: handle producer position overflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 24, 2023 at 6:24 AM Andrew Werner <awerner32@xxxxxxxxx> wrote:
>
> Before this patch, the producer position could overflow `unsigned
> long`, in which case libbpf would forever stop processing new writes to
> the ringbuf. This patch addresses that bug by avoiding ordered
> comparison between the consumer and producer position. If the consumer
> position is greater than the producer position, the assumption is that
> the producer has overflowed.
>
> A more defensive check could be to ensure that the delta is within
> the allowed range, but such defensive checks are neither present in
> the kernel side code nor in libbpf. The overflow that this patch
> handles can occur while the producer and consumer follow a correct
> protocol.

Yep, great find!

>
> A selftest was written to demonstrate the bug, and indeed this patch
> allows the test to continue to make progress past the overflow.
> However, the author was unable to create a testing environment on a
> 32-bit machine, and the test requires substantial memory and over 4

2GB of memory for ringbuf, right? Perhaps it would be good to just
outline the repro, even if we won't have it implemented in selftests.
Something along the lines of: a) set up ringbuf of 2GB size and
reserve+commit maximum-sized record (UMAX/4) constantly as fast as
possible. With 1 million records per second repro time should be about
4.7 hours. Can you please update the commit with something like that
instead of a vague "there is repro, but I won't show it ;)" ? Thanks!

> hours to hit the overflow point on a 64-bit machine. Thus, the test
> is not included in this patch because of the impracticality of running
> it.
>
> Additionally, this patch adds commentary around a separate point to note
> that the modular arithmetic is valid in the face of overflows, as that
> fact may not be obvious to future readers.
>
> Signed-off-by: Andrew Werner <awerner32@xxxxxxxxx>
> ---
>  tools/lib/bpf/ringbuf.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index 02199364db13..6271757bc3d2 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -237,7 +237,11 @@ static int64_t ringbuf_process_ring(struct ring *r)
>         do {
>                 got_new_data = false;
>                 prod_pos = smp_load_acquire(r->producer_pos);
> -               while (cons_pos < prod_pos) {
> +
> +               /* Avoid signed comparisons between the positions; the producer

"signed comparisons" is confusing and invalid, as cons_pos and
prod_pos are unsigned and comparison here is unsigned. What you meant
is inequality comparison, which is invalid when done naively (like it
is done in libbpf right now, sigh...), if counters can wrap around.

> +                * position can overflow before the consumer position.
> +                */
> +               while (cons_pos != prod_pos) {

I'm wondering if we should preserve the "consumer pos is before
producer pos" check just for clarity's sake (with a comment about
wrapping around of counters, of course) like:

if ((long)(cons_pos - prod_pos) < 0) ?

BTW, I think kernel code needs fixing as well in
__bpf_user_ringbuf_peek (we should compare consumer/producer positions
directly, only through subtraction and casting to signed long as
above), would you be able to fix it at the same time with libbpf?
Would be good to also double-check the rest of kernel/bpf/ringbuf.c to
make sure we don't directly compare positions anywhere else.

>                         len_ptr = r->data + (cons_pos & r->mask);
>                         len = smp_load_acquire(len_ptr);
>
> @@ -498,6 +502,11 @@ void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
>         prod_pos = smp_load_acquire(rb->producer_pos);
>
>         max_size = rb->mask + 1;
> +
> +       /* Note that this formulation in the face of overflow of prod_pos
> +        * so long as the delta between prod_pos and cons_pos remains no
> +        * greater than max_size.
> +        */
>         avail_size = max_size - (prod_pos - cons_pos);
>         /* Round up total size to a multiple of 8. */
>         total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8;
> --
> 2.39.2
>
>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux