Re: GCC 4.7 and atomic_flag on PowerPC

Lawrence Crowl <crowl@xxxxxxxxxx> · Sat, 31 Mar 2012 22:44:57 -0700

On 3/28/12, Sebastian Huber <sebastian.huber@xxxxxxxxxxxxxxxxxx> wrote:
> I have a small test program for the new <stdatomic.h> features
> which should implement a simple spin lock:
>
> #include <stdatomic.h>
>
> void lock_int(atomic_int *x)
> {
>    int expected = 0;
>    _Bool success;
>
>    do {
>      success = atomic_compare_exchange_weak_explicit(
>        x,
>        &expected,
>        1,
>        memory_order_acquire,
>        memory_order_relaxed
>      );
>    } while (!success);
> }

The normal idiom for using compare-exchange, using your variable
names and memory orders, is:

  expected = atomic_load_explicit( x, memory_order_relaxed );
  do {
    desired = function(expected);
  } while (!atomic_compare_exchange_weak_explicit(x, &expected,
            desired, memory_order_acquire, memory_order_relaxed));

The key thing to note is that when the comparison fails,
compare-exchange updates the value of *expected with the value found.
This update enables efficient and concise expression of computation
of the new value.

So, if in your function, the compare-exchange finds *x with a 1
the first iteration, it changes expected to 1, and it will likely
find it 1 on the second iteration.  Then on the second iteration,
the compare-exchange will pass, even though you have not acquired
the lock.  You need to reset expected on each iteration if you use
compare-exchange this way.

Some compilers can eliminate that reassignment.

I recommend that you do not code spin loops, they can cause really
bad behavior in all but very tightly controled environments.

>
> void unlock_int(atomic_int *x)
> {
>    atomic_store_explicit(x, 0, memory_order_release);
> }
>
> void lock_flag(atomic_flag *x)
> {
>    _Bool flag;
>
>    do {
>      flag = atomic_flag_test_and_set_explicit(
>        x,
>        memory_order_acquire
>      );
>    } while (flag);
> }
>
> void unlock_flag(atomic_flag *x)
> {
>    atomic_flag_clear_explicit(x, memory_order_release);
> }
>
> Here is the generated assembler for PowerPC:
>
>          .file   "atomic.c"
>          .section        ".text"
>          .align 2
>          .globl lock_int
>          .type   lock_int, @function
> lock_int:
>          stwu 1,-24(1)
>          li 9,0
>          li 8,1
>          stw 9,8(1)
> .L3:
>          lwarx 9,0,3
>          lwz 10,8(1)
>          cmpw 0,9,10
>          bne- 0,.L2
>          stwcx. 8,0,3
>          isync
> .L2:
>          stw 9,8(1)
>          bne- 0,.L3
>          addi 1,1,24
>          blr
>          .size   lock_int, .-lock_int
>          .align 2
>          .globl unlock_int
>          .type   unlock_int, @function
> unlock_int:
>          li 9,0
>          sync
>          stw 9,0(3)
>          blr
>          .size   unlock_int, .-unlock_int
>          .align 2
>          .globl lock_flag
>          .type   lock_flag, @function
> lock_flag:
>          rlwinm 5,3,3,27,28
>          li 8,255
>          xori 5,5,24
>          li 11,1
>          rlwinm 3,3,0,0,29
>          li 4,0
>          slw 8,8,5
>          slw 11,11,5
> .L11:
>          lbz 7,0(4)
>          slw 7,7,5
> .L9:
>          lwarx 9,0,3
>          and 6,9,8
>          andc 10,9,8
>          cmpw 0,6,7
>          or 10,10,11
>          bne- 0,.L10
>          stwcx. 10,0,3
>          bne- 0,.L9
> .L10:
>          isync
>          srw 9,9,5
>          stb 9,0(4)
>          beq+ 0,.L11
>          blr
>          .size   lock_flag, .-lock_flag
>          .align 2
>          .globl unlock_flag
>          .type   unlock_flag, @function
> unlock_flag:
>          li 9,0
>          sync
>          stb 9,0(3)
>          blr
>          .size   unlock_flag, .-unlock_flag
>          .ident  "GCC: (GNU) 4.7.0 20120307 (prerelease)"
>
> The unlock_int() and unlock_flag() look quite nice.  Also
> lock_int() is quite good, only the lwz 10,8(1) inside the loop
> seems to be a bit suboptimal.
>
> The lock_flag() is surprisingly complicated.  The problem seems
> to be that atomic_flag is a byte and word operations are used
> to load and store it (lwarx and stwcx).  This requires a lot
> of bit operations.  Why not use int as the type for atomic_flag
> on PowerPC?

I do not know why the implementors chose to do it that way, but
there is a time/space trade off.

-- 
Lawrence Crowl