On Sat, Feb 15, 2014 at 9:30 AM, Torvald Riegel <triegel@xxxxxxxxxx> wrote: > > I think the example is easy to misunderstand, because the context isn't > clear. Therefore, let me first try to clarify the background. > > (1) The abstract machine does not write speculatively. > (2) Emitting a branch instruction and executing a branch at runtime is > not part of the specified behavior of the abstract machine. Of course, > the abstract machine performs conditional execution, but that just > specifies the output / side effects that it must produce (e.g., volatile > stores) -- not with which hardware instructions it is producing this. > (3) A compiled program must produce the same output as if executed by > the abstract machine. Ok, I'm fine with that. > Thus, we need to be careful what "speculative store" is meant to refer > to. A few examples: > > if (atomic_load(&x, mo_relaxed) == 1) > atomic_store(&y, 3, mo_relaxed)); No, please don't use this idiotic example. It is wrong. The fact is, if a compiler generates anything but the obvious sequence (read/cmp/branch/store - where branch/store might obviously be done with some other machine conditional like a predicate), the compiler is wrong. Anybody who argues anything else is wrong, or confused, or confusing. Instead, argue about *other* sequences where the compiler can do something. For example, this sequence: atomic_store(&x, a, mo_relaxed); b = atomic_load(&x, mo_relaxed); can validly be transformed to atomic_store(&x, a, mo_relaxed); b = (typeof(x)) a; and I think everybody agrees about that. In fact, that optimization can be done even for mo_strict. But even that "obvious" optimization has subtle cases. What if the store is relaxed, but the load is strict? You can't do the optimization without a lot of though, because dropping the strict load would drop an ordering point. So even the "store followed by exact same load" case has subtle issues. With similar caveats, it is perfectly valid to merge two consecutive loads, and to merge two consecutive stores. Now that means that the sequence atomic_store(&x, 1, mo_relaxed); if (atomic_load(&x, mo_relaxed) == 1) atomic_store(&y, 3, mo_relaxed); can first be optimized to atomic_store(&x, 1, mo_relaxed); if (1 == 1) atomic_store(&y, 3, mo_relaxed); and then you get the end result that you wanted in the first place (including the ability to re-order the two stores due to the relaxed ordering, assuming they can be proven to not alias - and please don't use the idiotic type-based aliasing rules). Bringing up your first example is pure and utter confusion. Don't do it. Instead, show what are obvious and valid transformations, and then you can bring up these kinds of combinations as "look, this is obviously also correct". Now, the reason I want to make it clear that the code example you point to is a crap example is that because it doesn't talk about the *real* issue, it also misses a lot of really important details. For example, when you say "if the compiler can prove that the conditional is always true" then YOU ARE TOTALLY WRONG. So why do I say you are wrong, after I just gave you an example of how it happens? Because my example went back to the *real* issue, and there are actual real semantically meaningful details with doing things like load merging. To give an example, let's rewrite things a bit more to use an extra variable: atomic_store(&x, 1, mo_relaxed); a = atomic_load(&1, mo_relaxed); if (a == 1) atomic_store(&y, 3, mo_relaxed); which looks exactly the same. If you now say "if you can prove the conditional is always true, you can make the store unconditional", YOU ARE WRONG. Why? This sequence: atomic_store(&x, 1, mo_relaxed); a = atomic_load(&x, mo_relaxed); atomic_store(&y, 3, mo_relaxed); is actually - and very seriously - buggy. Why? Because you have effectively split the atomic_load into two loads - one for the value of 'a', and one for your 'proof' that the store is unconditional. Maybe you go "Nobody sane would do that", and you'd be right. But compilers aren't "sane" (and compiler _writers_ I have some serious doubts about), and how you describe the problem really does affect the solution. My description ("you can combine two subsequent atomic accesses, if you are careful about still having the same ordering points") doesn't have the confusion. It makes it clear that no, you can't speculate writes, but yes, obviously you can combine certain things (think "write buffers" and "memory caches"), and that may make you able to remove the conditional. Which is very different from speculating writes. But my description also makes it clear that the transformation above was buggy, but that rewriting it as a = 1; atomic_store(&y, 3, mo_relaxed); atomic_store(&x, a, mo_relaxed); is fine (doing the re-ordering here just to make a point). So I suspect we actually agree, I just think that the "example" that has been floating around, and the explanation for it, has been actively bad and misleading. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html