On Fri, Jul 06, 2018 at 07:01:31PM +0800, Guo Ren wrote: > On Thu, Jul 05, 2018 at 07:50:59PM +0200, Peter Zijlstra wrote: > > What's the memory ordering rules for your LDEX/STEX ? > Every CPU has a local exclusive monitor. > > "Ldex rz, (rx, #off)" will add an entry into the local monitor, and the > entry is composed of a address tag and a exclusive flag (inited with 1). > Any stores (include other cores') will break the exclusive flag to 0 in > the entry which could be indexed by the address tag. > > "Stex rz, (rx, #off)" has two condition: > 1. Store Success: When the entry's exclusive flag is 1, it will store rz > to the [rx + off] address and the rz will be set to 1. > 2. Store Failure: When the entry's exclusive flag is 0, just rz will be > set to 0. That's how LL/SC works. What I was asking is if they have any effect on memory ordering. Some architectures have LL/SC imply memory ordering, most do not. Going by your spinlock implementation they don't imply any memory ordering. > > The mandated semantics for xchg() / cmpxchg() is an effective smp_mb() > > before _and_ after. > > switch (size) { \ > case 4: \ > smp_mb(); \ > asm volatile ( \ > "1: ldex.w %0, (%3) \n" \ > " mov %1, %2 \n" \ > " stex.w %1, (%3) \n" \ > " bez %1, 1b \n" \ > : "=&r" (__ret), "=&r" (tmp) \ > : "r" (__new), "r"(__ptr) \ > : "memory"); \ > smp_mb(); \ > break; \ > Hmm? > But I couldn't undertand what's wrong without the 1th smp_mb()? > 1th smp_mb will make all ld/st finish before ldex.w. Is it necessary? Yes. CPU0 CPU1 r1 = READ_ONCE(x); WRITE_ONCE(y, 1); r2 = xchg(&y, 2); smp_store_release(&x, 1); must not allow: r1==1 && r2==0 > > The above implementation suggests LDEX implies a SYNC.IS, is this > > correct? > No, ldex doesn't imply a sync.is. Right, as per the spinlock emails, then your proposed primitives are incorrect.