> Z6.0+pooncelock+poonceLock+pombonce.litmus shows an example of > how full ordering is subtely incomplete without smp_mb__after_spinlock(). > > But still, smp_mb__after_unlock_lock() is supposed to be weaker than > smp_mb__after_spinlock() and yet I'm failing to produce a litmus test > that is successfull with the latter and fails with the former. smp_mb__after_unlock_lock() is a nop without a matching unlock-lock; smp_mb__after_spinlock() not quite... C after_spinlock__vs__after_unlock_lock { } P0(int *x, int *y, spinlock_t *s) { int r0; WRITE_ONCE(*x, 1); spin_lock(s); smp_mb__after_spinlock(); r0 = READ_ONCE(*y); spin_unlock(s); } P1(int *x, int *y) { int r1; WRITE_ONCE(*y, 1); smp_mb(); r1 = READ_ONCE(*x); } exists (0:r0=0 /\ 1:r1=0) > For example, and assuming smp_mb__after_unlock_lock() is expected to be > chained across locking, here is a litmus test inspired by > Z6.0+pooncelock+poonceLock+pombonce.litmus that never observes the condition > even though I would expect it should, as opposed to using > smp_mb__after_spinlock(): > > C smp_mb__after_unlock_lock > > {} > > P0(int *w, int *x, spinlock_t *mylock) > { > spin_lock(mylock); > WRITE_ONCE(*w, 1); > WRITE_ONCE(*x, 1); > spin_unlock(mylock); > } > > P1(int *x, int *y, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > smp_mb__after_unlock_lock(); > r0 = READ_ONCE(*x); > WRITE_ONCE(*y, 1); > spin_unlock(mylock); > } > > P2(int *y, int *z, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > r0 = READ_ONCE(*y); > WRITE_ONCE(*z, 1); > spin_unlock(mylock); > } > > P3(int *w, int *z) > { > int r1; > > WRITE_ONCE(*z, 2); > smp_mb(); > r1 = READ_ONCE(*w); > } > > exists (1:r0=1 /\ 2:r0=1 /\ z=2 /\ 3:r1=0) Here's an informal argument to explain this outcome. It is not the only according to the LKMM, but the first that came to my mind. And this is longer than I wished. TL; DR: Full barriers are strong, really strong. Remark full memory barriers share the following "strong-fence property": A ->full-barrier B implies (SFP) A propagates (aka, is visible) to _every CPU before B executes (cf. tools/memory-model/Documentation/explanation.txt for details about the concepts of "propagation" and "execution"). For example, in the snippet above, P0:WRITE_ONCE(*w, 1) ->full-barrier P1:spin_unlock(mylock) since P0:spin_unlock(mylock) ->reads-from P1:spin_lock(mylock) ->program-order P1:smp_mb__after_unlock_lock() acts as a full memory barrier. (1:r0=1 and 2:r0=1 together determine the so called critical-sections' order (CSO).) By contradiction, 1) P0:WRITE_ONCE(*w, 1) propagates to P3 before P1:spin_unlock(mylock) executes (SFP) 2) P1:spin_unlock(mylock) executes before P2:spin_lock(mylock) executes (CSO) 3) P2:spin_lock(mylock) executes before P2:WRITE_ONCE(*z, 1) executes (P2:spin_lock() is an ACQUIRE op) 4) P2:WRITE_ONCE(*z, 1) executes before P2:WRITE_ONCE(*z, 1) propagates P3 (intuitively, a store is visible to the local CPU before being visible to a remote CPU) 5) P2:WRITE_ONCE(*z, 1) propagates to P3 before P3:WRITE_ONCE(*z, 2) executes (z=2) 6) P3:WRITE_ONCE(*z, 2) executes before P3:WRITE_ONCE(*z, 2) propagates to P0 (a store is visible to the local CPU before being visible to a remote CPU) 7) P3:WRITE_ONCE(*z, 2) propagates to P0 before P3:READ_ONCE(*w) executes (SFP) 8) P3:READ_ONCE(*w) executes before P0:WRITE_ONCE(*w, 1) propagates to P3 (3:r1=0) Put together, (1-8) gives: P0:WRITE_ONCE(*w, 1) propagates to P3 before P0:WRITE_ONCE(*w, 1) propagates to P3 an absurd. Andrea