On Mon, Mar 29, 2021 at 8:50 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Mon, Mar 29, 2021 at 08:01:41PM +0800, Guo Ren wrote: > > u32 a = 0x55aa66bb; > > u16 *ptr = &a; > > > > CPU0 CPU1 > > ========= ========= > > xchg16(ptr, new) while(1) > > WRITE_ONCE(*(ptr + 1), x); > > > > When we use lr.w/sc.w implement xchg16, it'll cause CPU0 deadlock. > > Then I think your LL/SC is broken. No, it's not broken LR.W/SC.W. Quote <8.3 Eventual Success of Store-Conditional Instructions>: "As a consequence of the eventuality guarantee, if some harts in an execution environment are executing constrained LR/SC loops, and no other harts or devices in the execution environment execute an unconditional store or AMO to that reservation set, then at least one hart will eventually exit its constrained LR/SC loop. By contrast, if other harts or devices continue to write to that reservation set, it is not guaranteed that any hart will exit its LR/SC loop." So I think it's a feature of LR/SC. How does the above code (also use ll.w/sc.w to implement xchg16) running on arm64? 1: ldxr eor cbnz ... 2f stxr cbnz ... 1b // I think it would deadlock for arm64. "LL/SC fwd progress" which you have mentioned could guarantee stxr success? How hardware could do that? > > That also means you really don't want to build super complex locking > primitives on top, because that live-lock will percolate through. > > Step 1 would be to get your architecute fixed such that it can provide > fwd progress guarantees for LL/SC. Otherwise there's absolutely no point > in building complex systems with it. -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/