Re: [PATCH] libbpf: handle producer position overflow

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 23 Aug 2023 17:07:42 -0700

On Wed, Aug 23, 2023 at 3:14 PM Andrew Werner <awerner32@xxxxxxxxx> wrote:
>
> On Wed, Aug 23, 2023 at 12:52 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Tue, Aug 22, 2023 at 6:49 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > On 8/22/2023 1:15 PM, Andrii Nakryiko wrote:
> > > > On Mon, Jul 24, 2023 at 6:24 AM Andrew Werner <awerner32@xxxxxxxxx> wrote:
> > > >> Before this patch, the producer position could overflow `unsigned
> > > >> long`, in which case libbpf would forever stop processing new writes to
> > > >> the ringbuf. This patch addresses that bug by avoiding ordered
> > > >> comparison between the consumer and producer position. If the consumer
> > > >> position is greater than the producer position, the assumption is that
> > > >> the producer has overflowed.
> > > >>
> > > >> A more defensive check could be to ensure that the delta is within
> > > >> the allowed range, but such defensive checks are neither present in
> > > >> the kernel side code nor in libbpf. The overflow that this patch
> > > >> handles can occur while the producer and consumer follow a correct
> > > >> protocol.
> > > > Yep, great find!
> > > >
> > > >> A selftest was written to demonstrate the bug, and indeed this patch
> > > >> allows the test to continue to make progress past the overflow.
> > > >> However, the author was unable to create a testing environment on a
> > > >> 32-bit machine, and the test requires substantial memory and over 4
> > > > 2GB of memory for ringbuf, right? Perhaps it would be good to just
> > > > outline the repro, even if we won't have it implemented in selftests.
> > > > Something along the lines of: a) set up ringbuf of 2GB size and
> > > > reserve+commit maximum-sized record (UMAX/4) constantly as fast as
> > > > possible. With 1 million records per second repro time should be about
> > > > 4.7 hours. Can you please update the commit with something like that
> > > > instead of a vague "there is repro, but I won't show it ;)" ? Thanks!
> > >
> > > I think it would be great that the commit message can elaborate about
> > > the repo. Andrew had already posted an external link to the reproducer
> > > in v2 [0]: https://github.com/ajwerner/bpf/commit/85e1240e7713
> >
> > Yeah, of course, I saw it. I didn't mean that he actually refuses to
> > show repro, it's just that the commit message is vague about it and we
> > should improve that. I think we are on the same page. Including the
> > link to full repro in addition to more detailed description is also
> > good.
>
> I'll update the message as you suggest in v3.
>
> >
> > >
> > > [0]:
> > > https://lore.kernel.org/bpf/20230724132543.1282645-1-awerner32@xxxxxxxxx/
> > > >> hours to hit the overflow point on a 64-bit machine. Thus, the test
> > > >> is not included in this patch because of the impracticality of running
> > > >> it.
> > > >>
> > > >> Additionally, this patch adds commentary around a separate point to note
> > > >> that the modular arithmetic is valid in the face of overflows, as that
> > > >> fact may not be obvious to future readers.
> > > >>
> > > >> Signed-off-by: Andrew Werner <awerner32@xxxxxxxxx>
> > > >> ---
> > > >>  tools/lib/bpf/ringbuf.c | 11 ++++++++++-
> > > >>  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> > > >> index 02199364db13..6271757bc3d2 100644
> > > >> --- a/tools/lib/bpf/ringbuf.c
> > > >> +++ b/tools/lib/bpf/ringbuf.c
> > > >> @@ -237,7 +237,11 @@ static int64_t ringbuf_process_ring(struct ring *r)
> > > >>         do {
> > > >>                 got_new_data = false;
> > > >>                 prod_pos = smp_load_acquire(r->producer_pos);
> > > >> -               while (cons_pos < prod_pos) {
> > > >> +
> > > >> +               /* Avoid signed comparisons between the positions; the producer
> > > > "signed comparisons" is confusing and invalid, as cons_pos and
> > > > prod_pos are unsigned and comparison here is unsigned. What you meant
> > > > is inequality comparison, which is invalid when done naively (like it
> > > > is done in libbpf right now, sigh...), if counters can wrap around.
> > > >
> > > >> +                * position can overflow before the consumer position.
> > > >> +                */
> > > >> +               while (cons_pos != prod_pos) {
> > > > I'm wondering if we should preserve the "consumer pos is before
> > > > producer pos" check just for clarity's sake (with a comment about
> > > > wrapping around of counters, of course) like:
> > > >
> > > > if ((long)(cons_pos - prod_pos) < 0) ?
>
> Will do.
>
> > > >
> > > > BTW, I think kernel code needs fixing as well in
> > > > __bpf_user_ringbuf_peek (we should compare consumer/producer positions
> > > > directly, only through subtraction and casting to signed long as
> > > > above), would you be able to fix it at the same time with libbpf?
> > > > Would be good to also double-check the rest of kernel/bpf/ringbuf.c to
> > > > make sure we don't directly compare positions anywhere else.
> > >
> > > I missed __bpf_user_ringbuf_peek() when reviewed the patch. I also can
> > > help to double-check kernel/bpf/ringbuf.c.
> >
> > great, thanks!
>
> I'll update the code in __bpf_user_ringbuf_peek. One observation is
> that the code there differs from the regular (non-user) ringbuf and
> the libbpf code in that it uses u64 for the positions whereas
> everywhere else (including in the struct definition) the types of
> these positions are unsigned long. I get that on 64-bit architectures,
> practically speaking, these are the same. What I'm less clear on is
> whether there's anything that prevents this code from running (perhaps
> with bugs) on 32-bit machines. I suppose that on little-endian
> machines things will just work out until the positions overflow 32
> bits.
>
> Would it make sense for me to make the change to always use unsigned
> long for these values in the same change?

I'm not sure, but I suspect that u64 usage is not intentional, so we
probably do want to use unsigned long consistently.

cc'ing David as well

>
> >
> > > >
> > > >>                         len_ptr = r->data + (cons_pos & r->mask);
> > > >>                         len = smp_load_acquire(len_ptr);
> > > >>
> > > >> @@ -498,6 +502,11 @@ void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
> > > >>         prod_pos = smp_load_acquire(rb->producer_pos);
> > > >>
> > > >>         max_size = rb->mask + 1;
> > > >> +
> > > >> +       /* Note that this formulation in the face of overflow of prod_pos
> > > >> +        * so long as the delta between prod_pos and cons_pos remains no
> > > >> +        * greater than max_size.
> > > >> +        */
> > > >>         avail_size = max_size - (prod_pos - cons_pos);
> > > >>         /* Round up total size to a multiple of 8. */
> > > >>         total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8;
> > > >> --
> > > >> 2.39.2
> > > >>
> > > >>
> > > > .
> > >
> > >