1) Am I right that __builtin_ia32_mfence() does not generate a memory barrier?
That is correct: it does not prevent the compiler from moving loads and stores across the call to __builtin_ia32_mfence.
Are you sure? Based on your comment, I was fully expecting to be able to produce a failure case suitable for bugzilla. In fact, I *can* generate failure cases if I comment the __builtin_ia32_mfence() call out of _mm_mfence and replace it with something else (like asm("mfence")). But as soon as I put the __builtin_ia32_mfence call back in, my "failure scenario" clears right up. In short, it looks like __builtin_ia32_mfence *does* generate a barrier. But so do other builtins (like __builtin_ia32_pause). Does that even seem possible? It would be weird if every builtin (or even every ia32 builtin) implied a barrier. dw