On Tue, Feb 18, 2014 at 10:21 AM, Peter Sewell <Peter.Sewell@xxxxxxxxxxxx> wrote: > > This is a bit more subtle, because (on ARM and POWER) removing the > dependency and conditional branch is actually in general *not* equivalent > in the hardware, in a concurrent context. So I agree, but I think that's a generic issue with non-local memory ordering, and is not at all specific to the optimization wrt that "x?42:42" expression. If you have a value that you loaded with a non-relaxed load, and you pass that value off to a non-local function that you don't know what it does, in my opinion that implies that the compiler had better add the necessary serialization to say "whatever that other function does, we guarantee the semantics of the load". So on ppc, if you do a load with "consume" or "acquire" and then call another function without having had something in the caller that serializes the load, you'd better add the lwsync or whatever before the call. Exactly because the function call itself otherwise basically breaks the visibility into ordering. You've basically turned a load-with-ordering-guarantees into just an integer that you passed off to something that doesn't know about the ordering guarantees - and you need that "lwsync" in order to still guarantee the ordering. Tough titties. That's what a CPU with weak memory ordering semantics gets in order to have sufficient memory ordering. And I don't think it's actually a problem in practice. If you are doing loads with ordered semantics, you're not going to pass the result off willy-nilly to random functions (or you really *do* require the ordering, because the load that did the "acquire" was actually for a lock! So I really think that the "local optimization" is correct regardless. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html