On Wed, Jun 9, 2021 at 9:13 AM Marco Elver <elver@xxxxxxxxxx> wrote: > > I had a longer discussion with someone offline about it, and the > problem with a builtin is similar to the "memory_order_consume > implementation problem" The "memory_order_consume" problem is *entirely* artificial, and due to the C standards body incompetence. Really. I was there. Only very peripherally, but I was involved enough to know what the problem was. And the problem wasn't the concept of 'consume'. The problem was entirely and 100% the incorrect model that the C standards people used to describe the problem. The C standards people took a "syntax and type based" approach to the whole thing, and it was an utter disaster. It's the wrong model entirely, because it became very very hard to describe the issue in terms of optimizations of expressions and ordering at a syntactic level. What the standard _should_ have done, is to describe it in the same terms that "volatile" is described - make all memory accesses "visible in the virtual machine", and then specify the memory ordering requirements within that virtual machine. We have successful examples of that from other languages. I'm sorry if this hurts some C language lawyers fragile ego, but Christ, Java did it better. Java! A language that a lot of people love to piss on. But it did memory ordering fundamentally better. And it's not like it would even have been a new concept. The notion of "volatile" has been there since the very beginning of C. Yes, yes, the C++ people screwed it up mightily and confused themselves about what an "access" means. But "volatile" is actually a lot better specified than the memory ordering requirements were, and the specifications are (a) simpler and (b) much *much* easier for a compiler person to understand. Plus with memory ordering described as an operation - rather than as a type - even the C++ confusion of volatile would have gone away. So the very thing that likely made people want to avoid the "visible access in the virtual machine" model didn't even _exist_ in the first place. So the language committee pointlessly said "volatile is bad, we need to do something else", and came up with something that was an order of magnitude worse than volatile, and that simply _couldn't_ possibly sanely handle that "problem of consume". But the problem was always purely about the model used to _describe_ the issue being bad, not the issue itself. The "consume" memory ordering is actually very easy to describe in the "as if" virtual machine memory model (well, as easy as _any_ memory ordering is). If the C standards committee hadn't picked the wrong way to describe things, the problem simply would not exist. Really. And I guarantee you that compiler writes would have had an easier time with that "virtual memory model" approach too. No, memory ordering sure as hell isn't simple to understand for *anybody*, but it got about a million times worse by using the wrong abstraction layer to try to "explain" it. It really is fairly easy to explain what "acquire" is at a virtual machine model level. About as easy as memory ordering gets. For a compiler writer, it basically turns into "you have to do the actual access using XYZ, and then you can't move later memory operations to before it". End of story. So you can actually describe these things in fairly straighforward manner if you actually do it at that virtual machine level, because that's literally the language that the hardware itself works at. And then you could easily have defined "consume" as being the same thing as "acquire", except that you can drop the special XYZ access (fence, ld.acq, whatever) and replace it with a plain load if there are only data dependencies on the loaded value (assuming, of course, that your target hardware then supports that ordering requirements: alpha would _always_ need the barrier). That could literally have been done as a peephole optimization, and a compiler writer would never have had to even really worry about it. Easy peasy. 99% of all compiler writers would not have to know anything about the issue, there would be just one very special optimization at the end that allows you to drop a barrier (or turn a "ld.acq" into just an "ld") once you see all the uses of that loaded value. A trivial peephole will handle 99% of all cases, and then for the rest you just keep it as acquire. So anybody who tells you that "consume is complicated" is wrong. Consume is *not* complicated. They've just chosen the wrong model to describe it. Look, memory ordering pretty much _is_ the rocket science of CS, but the C standards committee basically made it a ton harder by specifying "we have to make the rocket out of duct tape and bricks, and only use liquid hydrogen as a propellant". Linus