On Fri, Feb 21, 2014 at 10:25 AM, Peter Sewell <Peter.Sewell@xxxxxxxxxxxx> wrote: > > If one thinks this is too fragile, then simply using memory_order_acquire > and paying the resulting barrier cost (and perhaps hoping that compilers > will eventually be able to optimise some cases of those barriers to > hardware-level dependencies) is the obvious alternative. No, the obvious alternative is to do what we do already, and just do it by hand. Using acquire is *worse* than what we have now. Maybe for some other users, the thing falls out differently. > Many concurrent things will "accidentally" work on x86 - consume is not > special in that respect. No. But if you have something that is mis-designed, easy to get wrong, and untestable, any sane programmer will go "that's bad". > There are two difficulties with this, if I understand correctly what you're > proposing. > > The first is knowing where to stop. No. Read my suggestion. Knowing where to stop is *trivial*. Either the dependency is immediate and obvious, or you treat it like an acquire. Seriously. Any compiler that doesn't turn the dependency chain into SSA or something pretty much equivalent is pretty much a joke. Agreed? So we can pretty much assume that the compiler will have some intermediate representation as part of optimization that is basically SSA. So what you do is, - build the SSA by doing all the normal parsing and possible tree-level optimizations you already do even before getting to the SSA stage - do all the normal optimizations/simplifications/cse/etc that you do normally on SSA - add *one* new rule to your SSA simplification that goes something like this: * when you see a load op that is marked with a "consume" barrier, just follow the usage chain that comes from that. * if you hit a normal arithmetic op, just follow the result chain of that * if you hit a memory operation address use, stop and say "looks good" * it you hit anything else (including a copy/phi/whatever), abort * if nothing aborted as part of the walk, you can now just remove the "consume" barrier. You can fancy it up and try to follow more cases, but realistically the only case that really matters is the "consume" being fed directly into one or more loads, with possibly an offset calculation in between. There are certainly more cases you could *try* to remove the barrier, but the thing is, it's never incorrect to not remove it, so any time you get bored or hit any complication at all, just do the "abort" part. I *guarantee* that if you describe this to a compiler writer, he will tell you that my scheme is about a billion times simpler than the current standard wording. Especially after you've pointed him to that gcc bugzilla entry and explained to him about how the current standard cares about those kinds of made-up syntactic chains that he likely removed quite early, possibly even as he was generating the semantic tree. Try it. I dare you. So if you want to talk about "difficulties", the current C standard loses. > The second is the proposal in later mails to use some notion of "semantic" > dependency instead of this syntactic one. Bah. The C standard does that all over. It's called "as-is". The C standard talks about how the compiler can do pretty much whatever it likes, as long as the end result acts the same in the virtual C machine. So claiming that "semantics" being meaningful is somehow complex is bogus. People do that all the time. If you make it clear that the dependency chain is through the *value*, not syntax, and that the value can be optimized all the usual ways, it's quite clear what the end result is. Any operation that actually meaningfully uses the value is serialized with the load, and if there is no meaningful use that would affect the end result in the virtual machine, then there is no barrier. Why would this be any different, especially since it's easy to understand both for a human and a compiler? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html