On Tue, Jun 4, 2013 at 3:58 PM, dw <limegreensocks@xxxxxxxxx> wrote: > > To create a > processor fence, you could do something like > > __builtin_ia32_mfence(); A better choice these days is __atomic_thread_fence(__ATOMIC_SEQ_CST) (or __atomic_signal_fence). > 1) Am I right that __builtin_ia32_mfence() does not generate a memory > barrier? That is correct: it does not prevent the compiler from moving loads and stores across the call to __builtin_ia32_mfence. > 1) Is this "two statement thing" guaranteed to be safe? Could the optimizer > re-order instructions moving code in between the two? (Yes, I realize that > the asm statement doesn't actually generate any output. But given my > understanding of how the compiler processes code, I believe the question is > still valid). It is probably safe, because why would the compiler put anything in there, but it is not absolutely guaranteed to be safe. > 2) If it is not guaranteed to be safe, what is the use of > __builtin_ia32_mfence()? What value is there in preventing the *processor* > from executing statements out of order if the *compiler* is just going to > move them around? __builtin_ia32_mfence exists to support the Intel documented _mm_mfence intrinsic. I'm not clear on whether _mm_mfence is meant to be a compiler memory barrier or not. If it is, then I think GCC has a bug in the way it is implemented. Please feel free to file a bug report at http://gcc.gnu.org/bugzilla/ , especially if you can come up with a case that fails. > I expect this would always work: > > asm ("mfence" ::: "memory"); > > But I would rather use the builtins if possible. Yes, you should use the builtins. The __atomic builtins, which work better and are portable across processors. Ian