Re: __ATOMIC_RELAXED and cache coherency

Jonathan Wakely <jwakely.gcc@xxxxxxxxx> · Mon, 21 Sep 2015 18:01:37 +0100

On 21 September 2015 at 17:39, Victor wrote:
> Hello!
>
> Could someone please explain the following details about
> `__ATOMIC_RELAXED`?
>
> --------
> Q1
>
> If thread `A` performs an atomic store with relaxed memory order, and
> *after that* thread `B` performs an atomic load with relaxed memory
> order, is it guaranteed that `B` gets actual value, on all
> architectures supported by GCC?

What do you mean by "after that"? How do you know it is after it?

The usual way to know that even T1 happens before even T2 is by
observing the effect of some atomic operation that imposes a
happens-before relationship, but relaxed operations don't do that.

So your question isn't really meaningful.

It is guaranteed that B gets either the original value or the value
written by A (not some other value that appeared out of thin air) but
the whole point of relaxed operations is that they don't impose any
synchronisation that would allow you to say A's store happens before
B's load.

> --------
> Q2
>
> From GCC wiki: https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync
>
>> There is also the presumption that relaxed stores from one thread are
>> seen by relaxed loads in another thread within a reasonable amount of
>> time. That means that on non-cache-coherent architectures, relaxed
>> operations need to flush the cache (although these flushes can be
>> merged across several relaxed operations)
>
>  2.1 I can't it figure out, does this mean that GCC will generate a
>      cache flush, if necessary, or should I flush it by myself?

It means GCC might generate one, if needed to ensure that relaxed
stores get seen in a reasonable amount of time.

There are better sources of information on atomic operations than old
wiki pages written while the work was still being implemented in GCC.

>  2.2 What is the actual list of those non-cache-coherent architectures,
>      supported by GCC?
>
> --------
> Q3
>
> On what architectures GCC will generate different code for
> `__atomic_load(__ATOMIC_RELAXED)` and usual read of `volatile` variable?
>
> I've tried these examples:
>
>  E1.
>
>     volatile int data;
>
>     int main() {
>         data;
>         return 0;
>     }
>
>  E2.
>
>     volatile int data;
>
>     int main() {
>         __atomic_load_n(&data, __ATOMIC_RELAXED);
>         return 0;
>     }
>
> For x86/64, arm (arm-none-eabi-gcc w/o addition options), and mipsel
> (mipsel-linux-gnu-gcc w/o addition options), GCC generates identical
> code for both examples.

But this is a silly example, it doesn't tell you anything about how
the load might be affected by optimisations or other code around it.

Volatiles loads and stores cannot be re-ordered by the compiler, but
in general relaxed atomic loads and stores can be re-ordered, or
optimised away entirely. For example the compile can remove one of the
atomic loads here:

__atomic_load_n(&data, __ATOMIC_RELAXED);
__atomic_load_n(&data, __ATOMIC_RELAXED);

It is not allowed to do that for volatile reads.