> Therefore this hang should be observable on a hypothetical LKMM processor > which makes use of all the relaxed liberty the LKMM allows. However according > to the authors of that paper (who are my colleagues but I haven't been involved > deeply in that work), not even Power+gcc allow this reordering to happen, and if > that's true it is probably because the wmb is mapped to lwsync which is fully > cumulative in Power but not in LKMM. All the "issues" we mention in the technical report are according to LKMM. As shown by (*) below, as soon as the code gets compiled and verified against the corresponding hardware memory model, the code is correct. Here is a small variant of the litmus test I sent earlier where not only the "problematic behavior" is allowed by LKMM, but where liveness is actually violated. The code is written in C (main function and headers missing) and cannot be used directly with herd7 (since I am not sure if the end of thread_3 can be written using herd7 syntax). ------------------------------------------------------------------------ int y, z; atomic_t x; void *thread_1(void *unused) { // clear_pending_set_locked int r0 = atomic_fetch_add(2,&x) ; } void *thread_2(void *unused) { // this store breaks liveness WRITE_ONCE(y, 1); // queued_spin_trylock int r0 = atomic_read(&x); // barrier after the initialisation of nodes smp_wmb(); // xchg_tail int r1 = atomic_cmpxchg_relaxed(&x,r0,42); // link node into the waitqueue WRITE_ONCE(z, 1); } void *thread_3(void *unused) { // node initialisation WRITE_ONCE(z, 2); // queued_spin_trylock int r0 = atomic_read(&x); // barrier after the initialisation of nodes smp_wmb(); // if we read z==2 we expect to read this store WRITE_ONCE(y, 0); // xchg_tail int r1 = atomic_cmpxchg_relaxed(&x,r0,24); // spinloop while(READ_ONCE(y) == 1 && (READ_ONCE(z) == 2)) {} } ------------------------------------------------------------------------ Liveness is violated (following Theorem 5.3 of the "Making weak memory models fair" paper) because the reads from the spinloop can get their values from writes which come last in the coherence / modification order, and those values do not stop the spinning. ------------------------------------------------------------------------ $ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/linux-kernel.cat --target=lkmm --property=liveness liveness.c ... Liveness violation found FAIL ------------------------------------------------------------------------ (*) However, if the code is compiled (this transformation is done automatically and internally by the tool, notice the --target option) and we use some hardware memory model, the tool says the code is correct ------------------------------------------------------------------------ $ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/aarch64.cat --target=arm8 --property=liveness liveness.c ... PASS $ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/power.cat --target=power --property=liveness liveness.c ... PASS $ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/riscv.cat --target=riscv --property=liveness liveness.c ... PASS ------------------------------------------------------------------------ I think it is somehow possible to show the liveness violation using herd7 and the following variant of the code ------------------------------------------------------------------------ C Liveness { atomic_t x = ATOMIC_INIT(0); atomic_t y = ATOMIC_INIT(0); } P0(atomic_t *x) { // clear_pending_set_locked int r0 = atomic_fetch_add(2,x) ; } P1(atomic_t *x, int *z, int *y) { // this store breaks liveness WRITE_ONCE(*y, 1); // queued_spin_trylock int r0 = atomic_read(x); // barrier after the initialisation of nodes smp_wmb(); // xchg_tail int r1 = atomic_cmpxchg_relaxed(x,r0,42); // link node into the waitqueue WRITE_ONCE(*z, 1); } P2(atomic_t *x,int *z, int *y) { // node initialisation WRITE_ONCE(*z, 2); // queued_spin_trylock int r0 = atomic_read(x); // barrier after the initialisation of nodes smp_wmb(); // if we read z==2 we expect to read this store WRITE_ONCE(*y, 0); // xchg_tail int r1 = atomic_cmpxchg_relaxed(x,r0,24); // spinloop int r2 = READ_ONCE(*y); int r3 = READ_ONCE(*z); } exists (z=2 /\ y=1 /\ 2:r2=1 /\ 2:r3=2) ------------------------------------------------------------------------ Condition "2:r3=2" forces the spinloop to read from the first write in P2 and "z=2" forces this write to be last in the coherence order. Conditions "2:r2=1" and "y=1" force the same for the other read. herd7 says this behavior is allowed by LKMM, showing that liveness can be violated. In all the examples above, if we use mb() instead of wmb(), LKMM does not accept the behavior and thus liveness is guaranteed. Hernan