On Wed, Aug 23, 2023 at 3:14 PM Andrew Werner <awerner32@xxxxxxxxx> wrote: > > On Wed, Aug 23, 2023 at 12:52 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Tue, Aug 22, 2023 at 6:49 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > On 8/22/2023 1:15 PM, Andrii Nakryiko wrote: > > > > On Mon, Jul 24, 2023 at 6:24 AM Andrew Werner <awerner32@xxxxxxxxx> wrote: > > > >> Before this patch, the producer position could overflow `unsigned > > > >> long`, in which case libbpf would forever stop processing new writes to > > > >> the ringbuf. This patch addresses that bug by avoiding ordered > > > >> comparison between the consumer and producer position. If the consumer > > > >> position is greater than the producer position, the assumption is that > > > >> the producer has overflowed. > > > >> > > > >> A more defensive check could be to ensure that the delta is within > > > >> the allowed range, but such defensive checks are neither present in > > > >> the kernel side code nor in libbpf. The overflow that this patch > > > >> handles can occur while the producer and consumer follow a correct > > > >> protocol. > > > > Yep, great find! > > > > > > > >> A selftest was written to demonstrate the bug, and indeed this patch > > > >> allows the test to continue to make progress past the overflow. > > > >> However, the author was unable to create a testing environment on a > > > >> 32-bit machine, and the test requires substantial memory and over 4 > > > > 2GB of memory for ringbuf, right? Perhaps it would be good to just > > > > outline the repro, even if we won't have it implemented in selftests. > > > > Something along the lines of: a) set up ringbuf of 2GB size and > > > > reserve+commit maximum-sized record (UMAX/4) constantly as fast as > > > > possible. With 1 million records per second repro time should be about > > > > 4.7 hours. Can you please update the commit with something like that > > > > instead of a vague "there is repro, but I won't show it ;)" ? Thanks! > > > > > > I think it would be great that the commit message can elaborate about > > > the repo. Andrew had already posted an external link to the reproducer > > > in v2 [0]: https://github.com/ajwerner/bpf/commit/85e1240e7713 > > > > Yeah, of course, I saw it. I didn't mean that he actually refuses to > > show repro, it's just that the commit message is vague about it and we > > should improve that. I think we are on the same page. Including the > > link to full repro in addition to more detailed description is also > > good. > > I'll update the message as you suggest in v3. > > > > > > > > > [0]: > > > https://lore.kernel.org/bpf/20230724132543.1282645-1-awerner32@xxxxxxxxx/ > > > >> hours to hit the overflow point on a 64-bit machine. Thus, the test > > > >> is not included in this patch because of the impracticality of running > > > >> it. > > > >> > > > >> Additionally, this patch adds commentary around a separate point to note > > > >> that the modular arithmetic is valid in the face of overflows, as that > > > >> fact may not be obvious to future readers. > > > >> > > > >> Signed-off-by: Andrew Werner <awerner32@xxxxxxxxx> > > > >> --- > > > >> tools/lib/bpf/ringbuf.c | 11 ++++++++++- > > > >> 1 file changed, 10 insertions(+), 1 deletion(-) > > > >> > > > >> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c > > > >> index 02199364db13..6271757bc3d2 100644 > > > >> --- a/tools/lib/bpf/ringbuf.c > > > >> +++ b/tools/lib/bpf/ringbuf.c > > > >> @@ -237,7 +237,11 @@ static int64_t ringbuf_process_ring(struct ring *r) > > > >> do { > > > >> got_new_data = false; > > > >> prod_pos = smp_load_acquire(r->producer_pos); > > > >> - while (cons_pos < prod_pos) { > > > >> + > > > >> + /* Avoid signed comparisons between the positions; the producer > > > > "signed comparisons" is confusing and invalid, as cons_pos and > > > > prod_pos are unsigned and comparison here is unsigned. What you meant > > > > is inequality comparison, which is invalid when done naively (like it > > > > is done in libbpf right now, sigh...), if counters can wrap around. > > > > > > > >> + * position can overflow before the consumer position. > > > >> + */ > > > >> + while (cons_pos != prod_pos) { > > > > I'm wondering if we should preserve the "consumer pos is before > > > > producer pos" check just for clarity's sake (with a comment about > > > > wrapping around of counters, of course) like: > > > > > > > > if ((long)(cons_pos - prod_pos) < 0) ? > > Will do. > > > > > > > > > BTW, I think kernel code needs fixing as well in > > > > __bpf_user_ringbuf_peek (we should compare consumer/producer positions > > > > directly, only through subtraction and casting to signed long as > > > > above), would you be able to fix it at the same time with libbpf? > > > > Would be good to also double-check the rest of kernel/bpf/ringbuf.c to > > > > make sure we don't directly compare positions anywhere else. > > > > > > I missed __bpf_user_ringbuf_peek() when reviewed the patch. I also can > > > help to double-check kernel/bpf/ringbuf.c. > > > > great, thanks! > > I'll update the code in __bpf_user_ringbuf_peek. One observation is > that the code there differs from the regular (non-user) ringbuf and > the libbpf code in that it uses u64 for the positions whereas > everywhere else (including in the struct definition) the types of > these positions are unsigned long. I get that on 64-bit architectures, > practically speaking, these are the same. What I'm less clear on is > whether there's anything that prevents this code from running (perhaps > with bugs) on 32-bit machines. I suppose that on little-endian > machines things will just work out until the positions overflow 32 > bits. > > Would it make sense for me to make the change to always use unsigned > long for these values in the same change? I'm not sure, but I suspect that u64 usage is not intentional, so we probably do want to use unsigned long consistently. cc'ing David as well > > > > > > > > > > >> len_ptr = r->data + (cons_pos & r->mask); > > > >> len = smp_load_acquire(len_ptr); > > > >> > > > >> @@ -498,6 +502,11 @@ void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size) > > > >> prod_pos = smp_load_acquire(rb->producer_pos); > > > >> > > > >> max_size = rb->mask + 1; > > > >> + > > > >> + /* Note that this formulation in the face of overflow of prod_pos > > > >> + * so long as the delta between prod_pos and cons_pos remains no > > > >> + * greater than max_size. > > > >> + */ > > > >> avail_size = max_size - (prod_pos - cons_pos); > > > >> /* Round up total size to a multiple of 8. */ > > > >> total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8; > > > >> -- > > > >> 2.39.2 > > > >> > > > >> > > > > . > > > > > >