On Mon, Jun 07, 2021 at 11:01:39AM +0300, Alexander Monakov wrote: > Uhh... I was not talking about some (non-existent) "optimizing linker". > LTO works by relaunching the compiler from the linker and letting it > consume multiple translation units (which are fully preprocessed by that > point). So the very thing you wanted to avoid -- such barriers appearing > in close proximity where they can be deduplicated -- may arise after a > little bit of cross-unit inlining. > > My main point here is that using __COUNTER__ that way (making things > "unique" for the compiler) does not work in general when LTO enters the > picture. As long as that is remembered, I'm happy. Yup. Exactly the same issue as using this in any function that may end up inlined. > > In the case of "volatile_if()", we actually would like to have not a > > memory clobber, but a "memory read". IOW, it would be a barrier for > > any writes taking place, but reads can move around it. > > > > I don't know of any way to express that to the compiler. We've used > > hacks for it before (in gcc, BLKmode reads turn into that kind of > > barrier in practice, so you can do something like make the memory > > input to the asm be a big array). But that turned out to be fairly > > unreliable, so now we use memory clobbers even if we just mean "reads > > random memory". > > So the barrier which is a compiler barrier but not a machine barrier is > __atomic_signal_fence(model), but internally GCC will not treat it smarter > than an asm-with-memory-clobber today. It will do nothing for relaxed ordering, and do blockage for everything else. Can it do anything weaker than that? Segher