Re: [PATCH bpf-next v2] selftests/bpf: Fix arena_atomics selftest failure due to llvm change

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 5 Aug 2024 10:53:39 -0700

On Fri, Aug 2, 2024 at 7:59 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
> Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without
> return value like
>   __sync_fetch_and_add(&foo, 1);
> llvm BPF backend generates locked insn e.g.
>   lock *(u32 *)(r1 + 0) += r2
>
> If __sync_fetch_and_add(...) returns a value like
>   res = __sync_fetch_and_add(&foo, 1);
> llvm BPF backend generates like
>   r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
>
> But 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit
> since proper barrier is not inserted based on __sync_fetch_and_add() semantics.
>
> The above discrepancy is due to commit [2] where it tries to maintain backward
> compatability since before commit [2], __sync_fetch_and_add(...) generates
> lock insn in BPF backend.
>
> Based on discussion in [1], now it is time to fix the above discrepancy so we can
> have proper barrier support in jit. llvm patch [3] made sure that __sync_fetch_and_add(...)
> always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only
> be generated by inline asm. The same for __sync_fetch_and_and(), __sync_fetch_and_or()
> and __sync_fetch_and_xor().
>
> But the change in [3] caused arena_atomics selftest failure.
>
>   test_arena_atomics:PASS:arena atomics skeleton open 0 nsec
>   libbpf: prog 'and': BPF program load failed: Permission denied
>   libbpf: prog 'and': -- BEGIN PROG LOAD LOG --
>   arg#0 reference type('UNKNOWN ') size cannot be determined: -22
>   0: R1=ctx() R10=fp0
>   ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87
>   0: (18) r1 = 0xffffc90000064000       ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4)
>   2: (61) r6 = *(u32 *)(r1 +0)          ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>   3: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
>   4: (77) r0 >>= 32                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>   5: (5d) if r0 != r6 goto pc+11        ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x)
>   ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91
>   6: (18) r1 = 0x100000000060           ; R1_w=scalar()
>   8: (bf) r1 = addr_space_cast(r1, 0, 1)        ; R1_w=arena
>   9: (18) r2 = 0x1100000000             ; R2_w=0x1100000000
>   11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2)
>   BPF_ATOMIC stores into R1 arena is not allowed
>   processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>   -- END PROG LOAD LOG --
>   libbpf: prog 'and': failed to load: -13
>   libbpf: failed to load object 'arena_atomics'
>   libbpf: failed to load BPF skeleton 'arena_atomics': -13
>   test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13)
>   #3       arena_atomics:FAIL
>
> The reason of the failure is due to [4] where atomic{64,}_fetch_{and,or,xor}() are not
> allowed by arena addresses. Without llvm patch [3], the compiler will generate 'lock ...'
> insn and everything will work fine.
>
> This patch fixed the problem by using inline asms. Instead of __sync_fetch_and_{and,or,xor}() functions,
> the inline asm with 'lock' insn is used and it will work with or without [3].
> Note that three bpf programs ('and', 'or' and 'xor') are guarded with __BPF_FEATURE_ADDR_SPACE_CAST
> as well to ensure compilation failure for llvm <= 18 version. Note that for llvm <= 18 where
> addr_space_cast is not supported, all arena_atomics subtests are skipped with below message:
>   test_arena_atomics:SKIP:no ENABLE_ATOMICS_TESTS or no addr_space_cast support in clang
>   #3 arena_atomics:SKIP
>
>   [1] https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@xxxxxxxxxx/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f
>   [2] https://github.com/llvm/llvm-project/commit/286daafd65129228e08a1d07aa4ca74488615744
>   [3] https://github.com/llvm/llvm-project/pull/101428
>   [4] d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
>
> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
> ---
>  .../selftests/bpf/progs/arena_atomics.c       | 63 ++++++++++++++++---
>  1 file changed, 54 insertions(+), 9 deletions(-)
>
> Changelog:
>   v1 -> v2:
>     - Add __BPF_FEATURE_ADDR_SPACE_CAST to guard newly added asm codes for llvm >= 19
>
> diff --git a/tools/testing/selftests/bpf/progs/arena_atomics.c b/tools/testing/selftests/bpf/progs/arena_atomics.c
> index bb0acd79d28a..dea54557fc00 100644
> --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> @@ -5,6 +5,7 @@
>  #include <bpf/bpf_tracing.h>
>  #include <stdbool.h>
>  #include "bpf_arena_common.h"
> +#include "bpf_misc.h"
>
>  struct {
>         __uint(type, BPF_MAP_TYPE_ARENA);
> @@ -85,10 +86,24 @@ int and(const void *ctx)
>  {
>         if (pid != (bpf_get_current_pid_tgid() >> 32))
>                 return 0;
> -#ifdef ENABLE_ATOMICS_TESTS
> +#if defined(ENABLE_ATOMICS_TESTS) && defined(__BPF_FEATURE_ADDR_SPACE_CAST)
>
> -       __sync_fetch_and_and(&and64_value, 0x011ull << 32);
> -       __sync_fetch_and_and(&and32_value, 0x011);
> +       asm volatile(
> +               "r1 = addr_space_cast(%[and64_value], 0, 1);"
> +               "lock *(u64 *)(r1 + 0) &= %[val]"
> +               :
> +               : __imm_ptr(and64_value),
> +                 [val]"r"(0x011ull << 32)
> +               : "r1"
> +       );
> +       asm volatile(
> +               "r1 = addr_space_cast(%[and32_value], 0, 1);"
> +               "lock *(u32 *)(r1 + 0) &= %[val]"
> +               :
> +               : __imm_ptr(and32_value),
> +                 [val]"w"(0x011)
> +               : "r1"
> +       );

Instead of inline asm there is a better way to do the same in C.
https://godbolt.org/z/71PYx1oqE

void foo(int a, _Atomic int *b)
{
 *b += a;
}

generates:
lock *(u32 *)(r2 + 0) += r1

but
*b &= a;

crashes llvm :( with

<source>:3:5: error: unsupported atomic operation, please use 64 bit version
    3 |  *b &= a;

but works with -mcpu=v3

So let's make this test mcpu=v3 only and use normal C ?

pw-bot: cr