Ilya Leoshkevich <iii@xxxxxxxxxxxxx> writes: >> Puranjay Mohan <puranjay@xxxxxxxxxx> writes: >> >> > The BPF atomic operations with the BPF_FETCH modifier along with >> > BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT >> > implements >> > all atomic operations except BPF_CMPXCHG with relaxed ordering. >> >> I know that the BPF memory model is in the works and we currently >> don't >> have a way to make all the JITs consistent. But as far as atomic >> operations are concerned here are my observations: > > [...] > >> 4. S390 >> ---- >> >> Ilya, can you help with this? >> >> I see that the kernel is emitting "bcr 14,0" after "laal|laalg" but >> the >> JIT is not. > > Hi, > > Here are two relevant paragraphs from the Principles of Operation: > > Relation between Operand Accesses > ================================= > As observed by other CPUs and by channel pro- > grams, storage-operand fetches associated with one > instruction execution appear to precede all storage- > operand references for conceptually subsequent > instructions. A storage-operand store specified by > one instruction appears to precede all storage-oper- > and stores specified by conceptually subsequent > instructions, but it does not necessarily precede stor- > age-operand fetches specified by conceptually sub- > sequent instructions. However, a storage-operand > store appears to precede a conceptually subsequent > storage-operand fetch from the same main-storage > location. > > In short, all memory accesses are fully ordered except for > stores followed by fetches from different addresses. Thanks for sharing the details. So, this is TSO like x86 > LAALG R1,R3,D2(B2) > ================== > [...] > All accesses to the second-operand location appear > to be a block-concurrent interlocked-update refer- > ence as observed by other CPUs and the I/O subsys- > tem. A specific-operand-serialization operation is > performed. > > Specific-operand-serialization is weaker than full serialization, > which means that, even though s390x provides very strong ordering > guarantees, strictly speaking, as architected, s390x atomics are not > fully ordered. > > I have a hard time thinking of a situation where a store-fetch > reordering for different addresses could matter, but to be on the safe > side we should probably just do what the kernel does and add a > "bcr 14,0". I will send a patch. Thanks, IMO, bcr 14,0 would be needed because, s390x has both int __atomic_add(int i, int *v); and int __atomic_add_barrier(int i, int *v); both of these do the fetch operation but the second one adds a barrier (bcr 14, 0) JIT was using the first one (without barrier) to implement the arch_atomic_fetch_add So, assuming that without this barrier, store->fetch reordering would be allowed as you said. It would mean: This litmus test would fail for the s390 JIT: C SB+atomic_fetch (* * Result: Never * * This litmus test demonstrates that LKMM expects total ordering from * atomic_*() operations with fetch or return. *) { atomic_t dummy1 = ATOMIC_INIT(1); atomic_t dummy2 = ATOMIC_INIT(1); } P0(int *x, int *y, atomic_t *dummy1) { WRITE_ONCE(*x, 1); rd = atomic_fetch_add(1, dummy1); /* assuming this is implemented without barrier */ r0 = READ_ONCE(*y); } P1(int *x, int *y, atomic_t *dummy2) { WRITE_ONCE(*y, 1); rd = atomic_fetch_add(1, dummy2); /* assuming this is implemented without barrier */ r1 = READ_ONCE(*x); } exists (0:r0=0 /\ 1:r1=0) P.S. - It would be nice to have a tool that can convert litmus tests into BPF assembly programs and then we can test them on hardware after JITing. Thanks, Puranjay