On Sun, Jun 6, 2021 at 6:03 AM Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, Jun 05, 2021 at 08:41:00PM -0700, Linus Torvalds wrote: > > > > I think it's something of a bug when it comes to "asm volatile", but > > the documentation isn't exactly super-specific. > > Why would that be? "asm volatile" does not prevent optimisation. Sure it does. That's the whole and only *POINT* of the "volatile". It's the same as a vol;atile memory access. That very much prevents certain optimizations. You can't just join two volatile reads or writes, because they have side effects. And the exact same thing is true of inline asm. Even when they are *identical*, inline asms have side effects that gcc simply doesn't understand. And yes, those side effects can - and do - include "you can't just merge these". > It says this code has some unspecified side effect, and that is all! And that should be sufficient. But gcc then violates it, because gcc doesn't understand the side effects. Now, the side effects may be *subtle*, but they are very very real. Just placement of code wrt a branch will actually affect memory ordering, as that one example was. > > Something like this *does* seem to work: > > > > #define ____barrier(id) __asm__ __volatile__("#" #id: : :"memory") > > #define __barrier(id) ____barrier(id) > > #define barrier() __barrier(__COUNTER__) > > > > which is "interesting" or "disgusting" depending on how you happen to feel. > > __COUNTER__ is a preprocessor thing, much more like what you want here: > this does its work *before* everything the compiler does, while %= does > its thing *after* :-) > > (Not that I actually understand what you are trying to do with this). See my previous email for why two barriers in two different code sequences cannot just be joined into one and moved into the common parent. It actually is semantically meaningful *where* they are, and they are distinct barriers. The case we happen to care about is memory ordering issues. The example quoted may sound pointless and insane, and I actually don't believe we have real code that triggers the issue, because whenever we have a conditional barrier, the two sides of the conditional are generally so different that gcc would never merge any of it anyway. So the issue is mostly theoretical, but we do have code that is fairly critical, and that depends on memory ordering, and on some weakly ordered machines (which is where all these problems would happen), actual explicit memory barriers are also <i>much</i> too expensive. End result: we have code that depends on the fact that a read-to-write ordering exists if there is a data dependency or a control dependency between the two. No actual expensive CPU instruction to specify the ordering, because the ordering is implicit in the code flow itself. But that's what we need a compiler barrier for in the first place - the compiler certainly doesn't understand about this very subtle memory ordering issue, and we want to make sure that the code sequence *remains* that "if A then write B". Linus