Re: [PATCH bpf] riscv, bpf: make some atomic operations fully ordered

Puranjay Mohan <puranjay@xxxxxxxxxx> · Mon, 06 May 2024 14:46:53 +0000

Ilya Leoshkevich <iii@xxxxxxxxxxxxx> writes:

>> Puranjay Mohan <puranjay@xxxxxxxxxx> writes:
>> 
>> > The BPF atomic operations with the BPF_FETCH modifier along with
>> > BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT
>> > implements
>> > all atomic operations except BPF_CMPXCHG with relaxed ordering.
>> 
>> I know that the BPF memory model is in the works and we currently
>> don't
>> have a way to make all the JITs consistent. But as far as atomic
>> operations are concerned here are my observations:
>
> [...]
>
>> 4. S390
>>    ----
>> 
>> Ilya, can you help with this?
>> 
>> I see that the kernel is emitting "bcr 14,0" after "laal|laalg" but
>> the
>> JIT is not.
>
> Hi,
>
> Here are two relevant paragraphs from the Principles of Operation:
>
>   Relation between Operand Accesses
>   =================================
>   As observed by other CPUs and by channel pro-
>   grams, storage-operand fetches associated with one
>   instruction execution appear to precede all storage-
>   operand references for conceptually subsequent
>   instructions. A storage-operand store specified by
>   one instruction appears to precede all storage-oper-
>   and stores specified by conceptually subsequent
>   instructions, but it does not necessarily precede stor-
>   age-operand fetches specified by conceptually sub-
>   sequent instructions. However, a storage-operand
>   store appears to precede a conceptually subsequent
>   storage-operand fetch from the same main-storage
>   location.
>
> In short, all memory accesses are fully ordered except for
> stores followed by fetches from different addresses.

Thanks for sharing the details.

So, this is TSO like x86

>   LAALG R1,R3,D2(B2)
>   ==================
>   [...]
>   All accesses to the second-operand location appear
>   to be a block-concurrent interlocked-update refer-
>   ence as observed by other CPUs and the I/O subsys-
>   tem. A specific-operand-serialization operation is
>   performed.
>
> Specific-operand-serialization is weaker than full serialization,
> which means that, even though s390x provides very strong ordering
> guarantees, strictly speaking, as architected, s390x atomics are not
> fully ordered.
>
> I have a hard time thinking of a situation where a store-fetch
> reordering for different addresses could matter, but to be on the safe
> side we should probably just do what the kernel does and add a
> "bcr 14,0". I will send a patch.

Thanks,

IMO, bcr 14,0 would be needed because, s390x has both

  int __atomic_add(int i, int *v);

and

  int __atomic_add_barrier(int i, int *v);

both of these do the fetch operation but the second one adds a barrier
(bcr 14, 0)

JIT was using the first one (without barrier) to implement the arch_atomic_fetch_add

So, assuming that without this barrier, store->fetch reordering would be
allowed as you said.

It would mean:
This litmus test would fail for the s390 JIT:

  C SB+atomic_fetch

  (*
   * Result: Never
   *
   * This litmus test demonstrates that LKMM expects total ordering from
   * atomic_*() operations with fetch or return.
   *)

  {
          atomic_t dummy1 = ATOMIC_INIT(1);
          atomic_t dummy2 = ATOMIC_INIT(1);
  }

  P0(int *x, int *y, atomic_t *dummy1)
  {
          WRITE_ONCE(*x, 1);
          rd = atomic_fetch_add(1, dummy1);     /* assuming this is implemented without barrier */ 
          r0 = READ_ONCE(*y);
  }

  P1(int *x, int *y, atomic_t *dummy2)
  {
          WRITE_ONCE(*y, 1);
          rd = atomic_fetch_add(1, dummy2);    /* assuming this is implemented without barrier */
          r1 = READ_ONCE(*x);
  }

  exists (0:r0=0 /\ 1:r1=0)

P.S. - It would be nice to have a tool that can convert litmus tests
into BPF assembly programs and then we can test them on hardware after JITing.

Thanks,
Puranjay