On Thu, Oct 31, 2024 at 9:19 AM zhongjinji <zhongjinji@xxxxxxxxx> wrote: > > >It seems you didn't use the ringbuf related APIs (e.g., > >ring_buffer__consume()) in libbpf to access the ring buffer. In my > >understanding, the "xchg(&hdr->len, new_len)" in bpf_ringbuf_commit() > >works as the barrier to ensure the order of committed data and hdr->len, > >and accordingly the "smp_load_acquire(len_ptr)" in > >ringbuf_process_ring() in libbpf works as the paired barrier to ensure > >the order of hdr->len and the committed data. So I think the extra > >smp_wmb() in the kernel is not necessary here. Instead, you should fix > >your code in the userspace. > >> > >> Signed-off-by: zhongjinji <zhongjinji@xxxxxxxxx> > >> --- > >> kernel/bpf/ringbuf.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c > >> index e1cfe890e0be..a66059e2b0d6 100644 > >> --- a/kernel/bpf/ringbuf.c > >> +++ b/kernel/bpf/ringbuf.c > >> @@ -508,6 +508,10 @@ static void bpf_ringbuf_commit(void *sample, u64 flags, bool discard) > >> rec_pos = (void *)hdr - (void *)rb->data; > >> cons_pos = smp_load_acquire(&rb->consumer_pos) & rb->mask; > >> > >> + /* Make sure the modification of data is visible on other CPU's > >> + * before consume the event > >> + */ > >> + smp_wmb(); > >> if (flags & BPF_RB_FORCE_WAKEUP) > >> irq_work_queue(&rb->work); > >> else if (cons_pos == rec_pos && !(flags & BPF_RB_NO_WAKEUP)) > > well, I missed the implementation of __smp_store_release in the x86 architecture, > it is not necessary because __smp_store_release has a memory barrier in x86, yep, we do rely on xchg() and smp_load_acquire() pairing. I'm not familiar with arm64 specifics, I didn't know that this assumption is x86-64-specific. Cc'ed Paul who I believe suggested xchg() as a more performant way of achieving this. That happened years ago so all of us, I'm sure, lost a lot of context by now, but still, let's see. > but it can't guarantee the order of committed data in arm64 because > __smp_store_release does not include a memory barrier, and it just ensure > the variable is visible. The implementation of ARM64 is as follows, > in common/arch/arm64/include/asm/barrier.h > > #define __smp_load_acquire(p) \ > ({ \ > union { __unqual_scalar_typeof(*p) __val; char __c[1]; } __u; \ > typeof(p) __p = (p); \ > compiletime_assert_atomic_type(*p); \ > kasan_check_read(__p, sizeof(*p)); \ > switch (sizeof(*p)) { \ > case 1: \ > asm volatile ("ldarb %w0, %1" \ > : "=r" (*(__u8 *)__u.__c) \ > : "Q" (*__p) : "memory"); \ > break; \ > case 2: \ > asm volatile ("ldarh %w0, %1" \ > : "=r" (*(__u16 *)__u.__c) \ > : "Q" (*__p) : "memory"); \ > break; \ > case 4: \ > asm volatile ("ldar %w0, %1" \ > : "=r" (*(__u32 *)__u.__c) \ > : "Q" (*__p) : "memory"); \ > break; \ > case 8: \ > asm volatile ("ldar %0, %1" \ > : "=r" (*(__u64 *)__u.__c) \ > : "Q" (*__p) : "memory"); \ > break; \ > } \ > (typeof(*p))__u.__val; \ > }) >