On Fri, Aug 26, 2022 at 05:48:12AM -0700, Paul E. McKenney wrote: > Hello! > > I have not yet done more than glance at this one, but figured I should > send it along sooner rather than later. > > "Verifying and Optimizing Compact NUMA-Aware Locks on Weak > Memory Models", Antonio Paolillo, Hernán Ponce-de-León, Thomas > Haas, Diogo Behrens, Rafael Chehab, Ming Fu, and Roland Meyer. > https://arxiv.org/abs/2111.15240 > > The claim is that the queued spinlocks implementation with CNA violates > LKMM but actually works on all architectures having a formal hardware > memory model. > Translate one of their litmus test into a runnable one (there is a typo in it): C queued-spin-lock (* * P0 is lock-release * P1 is xchg_tail() * P2 is lock-acquire *) {} P0(int *x, atomic_t *l) { WRITE_ONCE(*x, 1); atomic_set_release(l, 1); } P1(int *x, atomic_t *l) { int val; int new; int old; val = atomic_read(l); new = val + 2; old = atomic_cmpxchg_relaxed(l, val, new); } P2(int *x, atomic_t *l) { int r0 = atomic_read_acquire(l); int r1 = READ_ONCE(*x); } exists (2:r0=3 /\ 2:r1=0) According to LKMM, the exist-clause could be triggered because: po-rel; coe: rfe; acq-po is not happen-before in LKMM. Alan actually explain why at a response to a GitHub issue: https://github.com/paulmckrcu/litmus/issues/11#issuecomment-1115235860 (Paste Alan's reply) """ As for why the LKMM doesn't require ordering for this test... I do not believe this is related to 2+2W. Think instead in terms of the LKMM's operational model: The store-release in P0 means that the x=1 write will propagate to each CPU before the y=1 write does. Since y=3 at the end, we know that y=1 (and hence x=1 too) propagates to P1 before the addition occurs. And we know that y=3 propagates to P2 before the load-acquire executes. But we _don't_ know that either y=1 or x=1 propagates to P2 before y=3 does! If the store in P1 were a store-release then this would have to be true (as you saw in your tests), but it isn't. In other words, the litmus test could execute with the following temporal ordering: P0 P1 P2 ---------- --------- ---------- Write x=1 Write y=1 [x=1 and y=1 propagate to P1] Read y=1 Write y=3 [y=3 propagates to P2] Read y=3 Read x=0 [x=1 and y=1 propagate to P2] (Note that when y=1 propagates to P2, it doesn't do anything because it won't overwrite the coherence-later store y=3.) It may be true that none of the architectures supported by Linux will allow this outcome for the test (although I suspect that the PPC-weird version _would_ be allowed if you fixed it). At any rate, disallowing this result in the memory model would probably require more effort than would be worthwhile. Alan """ The question is that whether we "fix" LKMM because of this, or we mention explicitly this is something "unsupported" by LKMM yet? Regards, Boqun > Thoughts? > > Thanx, Paul