On Fri, May 22, 2015 at 06:30:29PM +0100, Will Deacon wrote: > Hi Paul, > > On Thu, May 21, 2015 at 09:02:12PM +0100, Paul E. McKenney wrote: > > On Thu, May 21, 2015 at 08:24:22PM +0100, Will Deacon wrote: > > > On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote: > > > > On to #5: > > > > > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > > > if (r1 == 42) > > > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > > > ---------------------------------------------------- > > > > r2 = atomic_load_explicit(&y, memory_order_consume); > > > > if (r2 == 42) > > > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > > > > > > > The first thread's accesses are dependency ordered. The second thread's > > > > ordering is in a corner case that memory-barriers.txt does not cover. > > > > You are supposed to start control dependencies with READ_ONCE_CTRL(), not > > > > a memory_order_consume load (AKA rcu_dereference and friends). However, > > > > Alpha would have a full barrier as part of the memory_order_consume load, > > > > and the rest of the processors would (one way or another) respect the > > > > control dependency. And the compiler would have some fun trying to > > > > break it. > > > > > > But this is interesting because the first thread is ordered whilst the > > > second is not, so doesn't that effectively forbid the compiler from > > > constant-folding values if it can't prove that there is no dependency > > > chain? > > > > You lost me on this one. Are you suggesting that the compiler > > speculate the second thread's atomic store? That would be very > > bad regardless of dependency chains. > > > > So what constant-folding optimization are you thinking of here? > > If the above example is not amenable to such an optimization, could > > you please give me an example where constant folding would apply > > in a way that is sensitive to dependency chains? > > Unless I'm missing something, I can't see what would prevent a compiler > from looking at the code in thread1 and transforming it into the code in > thread2 (i.e. constant folding r1 with 42 given that the taken branch > must mean that r1 == 42). However, such an optimisation breaks the > dependency chain, which means that a compiler needs to walk backwards > to see if there is a dependency chain extending to r1. Indeed! Which is one reason that (1) integers are not allowed in dependency chains with a very few extremely constrained exceptions and (2) sequences of comparisons and/or undefined-behavior considerations that allow the compiler to exactly determine the pointer value break the dependency chain. > > > > So the current Linux memory model would allow (r1 == 42 && r2 == 42), > > > > but I don't know of any hardware/compiler combination that would > > > > allow it. And no, I am -not- going to update memory-barriers.txt for > > > > this litmus test, its theoretical interest notwithstanding! ;-) > > Of course, I'm not asking for that at all! I'm just trying to see how > your proposal holds up with the example. Whew! ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html