Re: Question about __builtin_ia32_mfence and memory barriers

Ian Lance Taylor <iant@xxxxxxxxxx> · Tue, 4 Jun 2013 16:52:26 -0700

On Tue, Jun 4, 2013 at 3:58 PM, dw <limegreensocks@xxxxxxxxx> wrote:
>
> To create a
> processor fence, you could do something like
>
>     __builtin_ia32_mfence();

A better choice these days is __atomic_thread_fence(__ATOMIC_SEQ_CST)
(or __atomic_signal_fence).

> 1) Am I right that __builtin_ia32_mfence() does not generate a memory
> barrier?

That is correct: it does not prevent the compiler from moving loads
and stores across the call to __builtin_ia32_mfence.

> 1) Is this "two statement thing" guaranteed to be safe?  Could the optimizer
> re-order instructions moving code in between the two? (Yes, I realize that
> the asm statement doesn't actually generate any output.  But given my
> understanding of how the compiler processes code, I believe the question is
> still valid).

It is probably safe, because why would the compiler put anything in
there, but it is not absolutely guaranteed to be safe.

> 2) If it is not guaranteed to be safe, what is the use of
> __builtin_ia32_mfence()?  What value is there in preventing the *processor*
> from executing statements out of order if the *compiler* is just going to
> move them around?

__builtin_ia32_mfence exists to support the Intel documented
_mm_mfence intrinsic.  I'm not clear on whether _mm_mfence is meant to
be a compiler memory barrier or not.  If it is, then I think GCC has a
bug in the way it is implemented.  Please feel free to file a bug
report at http://gcc.gnu.org/bugzilla/ , especially if you can come up
with a case that fails.

> I expect this would always work:
>
>     asm ("mfence" ::: "memory");
>
> But I would rather use the builtins if possible.

Yes, you should use the builtins.  The __atomic builtins, which work
better and are portable across processors.

Ian