> As far as I know, __builtin_ia32_mfence does not generate a barrier.
> However, what it does do is appear to be a function call to the main
> optimization stages of the compiler.
Ok, now that makes sense. I kept thinking that somehow it had to be
that the compiler was seeing this as a function, but then I kept
discarding that theory because the inline function didn't change anything.
I'm going to have to ponder the performance implications of this. For
example, it seems possible that asm("pause") could end up generating
better code than _mm_pause().
> However, it is still possible for memory loads and stores to move
after RTL
While it may be possible, I am unable to cause it to happen. Without a
solid example or authoritative docs describing _mm_mfence as performing
a ReadWriteBarrier (preferably both), I'm hard pressed to think of a
credible way to file this in bugzilla.
On this mildly unsatisfactory note, I'm going to assume that _mm_?fence
will work properly and cross my fingers. If I eventually find this not
to be true, I'll head straight to bugzilla.
Thanks for the help.
dw