Not only .aqrl, and I think the below also could be an RCsc when sc.w.aq is executed: A: Pre-Access B: lr.w.rl ADDR-0 ... C: sc.w.aq ADDR-0 D: Post-Acess Because sc.w.aq has overlap address & data dependency on lr.w.rl, the global memory order should be A->B->C->D when sc.w.aq is executed. For the amoswap The purpose of the whole patchset is to reduce the usage of independent fence rw, rw instructions, and maximize the usage of the .aq/.rl/.aqrl aonntation of RISC-V. __asm__ __volatile__ ( \ "0: lr.w %0, %2\n" \ " bne %0, %z3, 1f\n" \ " sc.w.rl %1, %z4, %2\n" \ " bnez %1, 0b\n" \ " fence rw, rw\n" \ "1:\n" \ > we end up with u == 1, v == 1, r1 on P0 is 0 and r1 on P1 is 0, for the > following litmus test? > > C lr-sc-aqrl-pair-vs-full-barrier > > {} > > P0(int *x, int *y, atomic_t *u) > { > int r0; > int r1; > > WRITE_ONCE(*x, 1); > r0 = atomic_cmpxchg(u, 0, 1); > r1 = READ_ONCE(*y); > } > > P1(int *x, int *y, atomic_t *v) > { > int r0; > int r1; > > WRITE_ONCE(*y, 1); > r0 = atomic_cmpxchg(v, 0, 1); > r1 = READ_ONCE(*x); > } > > exists (u=1 /\ v=1 /\ 0:r1=0 /\ 1:r1=0) I think my patchset won't affect the above sequence guarantee. Current RISC-V implementation only gives RCsc when the original value is the same at least once. So I prefer RISC-V cmpxchg should be: - "0: lr.w %0, %2\n" \ + "0: lr.w.rl %0, %2\n" \ " bne %0, %z3, 1f\n" \ " sc.w.rl %1, %z4, %2\n" \ " bnez %1, 0b\n" \ - " fence rw, rw\n" \ "1:\n" \ + " fence w, rw\n" \ To give an unconditional RSsc for atomic_cmpxchg. > > Regards, > Boqun -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/