Hi Daniel, My two cents (summarizing some findings we discussed privately): > I think adding the full barrier guarantees the following ordering, and the > memory model people can correct me if I'm wrong: > > CPU21 CPU22 > ------------------------ -------------------------- > UNLOCK pd->lock > smp_mb() > LOAD reorder_objects > INC reorder_objects > spin_unlock(&pqueue->reorder.lock) // release barrier > TRYLOCK pd->lock > > So if CPU22 has incremented reorder_objects but CPU21 reads 0 for it, CPU21 > should also have unlocked pd->lock so CPU22 can get it and serialize any > remaining jobs. This information inspired me to write down the following litmus test: (AFAICT, you independently wrote a very similar test, which is indeed quite reassuring! ;D) C daniel-padata { } P0(atomic_t *reorder_objects, spinlock_t *pd_lock) { int r0; spin_lock(pd_lock); spin_unlock(pd_lock); smp_mb(); r0 = atomic_read(reorder_objects); } P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock) { int r1; spin_lock(reorder_lock); atomic_inc(reorder_objects); spin_unlock(reorder_lock); //smp_mb(); r1 = spin_trylock(pd_lock); } exists (0:r0=0 /\ 1:r1=0) It seems worth noticing that this test's "exists" clause is satisfiable according to the (current) memory consistency model. (Informally, this can be explained by noticing that the RELEASE from the spin_unlock() in P1 does not provide any order between the atomic increment and the read part of the spin_trylock() operation.) FWIW, uncommenting the smp_mb() in P1 would suffice to prevent this clause from being satisfiable; I am not sure, however, whether this approach is feasible or ideal... (sorry, I'm definitely not too familiar with this code... ;/) Thanks, Andrea