Thanks for your reply. You mentioned the statement "is a compiler scheduling barrier for all expressions that load from or store values to memory". Does "memory" mean the main memory? Or does it include the CPU cache? ----- Original Message ---- From: Ian Lance Taylor <iant@xxxxxxxxxx> To: Hei Chan <structurechart@xxxxxxxxx> Cc: gcc-help@xxxxxxxxxxx Sent: Mon, April 11, 2011 2:42:07 PM Subject: Re: full memory barrier? Hei Chan <structurechart@xxxxxxxxx> writes: > I am a little bit confused what asm volatile ("" : : : "memory") does. > > I searched online; many people said that it creates the "full memory barrier". > > I have a test code: > int main() { > bool bar; > asm volatile ("" : : : "memory"); > bar = true; > return 1; > } > > Running g++ -c -g -Wa,-a,-ad foo.cpp gives me: > > 2:foo.cpp **** bool bar; > 3:foo.cpp **** asm volatile ("" : : : "memory"); > 22 .loc 1 3 0 > 4:foo.cpp **** bar = true; > 23 .loc 1 4 0 > > It doesn't involve any fence instruction. > > Maybe I completely misunderstand the idea of "full memory barrier". The definition of "memory barrier" is ambiguous when looking at code written in a high-level language. The statement "asm volatile ("" : : : "memory");" is a compiler scheduling barrier for all expressions that load from or store values to memory. That means something like a pointer dereference, an array index, or an access to a volatile variable. It may or may not include a reference to a local variable, as a local variable need not be in memory. This kind of compiler scheduling barrier can be used in conjunction with a hardware memory barrier. The compiler doesn't know that a hardware memory barrier is special, and it will happily move memory access instructions across the hardware barrier. Therefore, if you want to use a hardware memory barrier in compiled code, you must use it along with a compiler scheduling barrier. On the other hand a compiler scheduling barrier can be useful even without a hardware memory barrier. For example, in a coroutine based system with multiple light-weight threads running on a single processor, you need a compiler scheduling barrier, but you do not need a hardware memory barrier. gcc will generate a hardware memory barrier if you use the __sync_synchronize builtin function. That function acts as both a hardware memory barrier and a compiler scheduling barrier. Ian