On 3/28/12, Sebastian Huber <sebastian.huber@xxxxxxxxxxxxxxxxxx> wrote: > I have a small test program for the new <stdatomic.h> features > which should implement a simple spin lock: > > #include <stdatomic.h> > > void lock_int(atomic_int *x) > { > int expected = 0; > _Bool success; > > do { > success = atomic_compare_exchange_weak_explicit( > x, > &expected, > 1, > memory_order_acquire, > memory_order_relaxed > ); > } while (!success); > } The normal idiom for using compare-exchange, using your variable names and memory orders, is: expected = atomic_load_explicit( x, memory_order_relaxed ); do { desired = function(expected); } while (!atomic_compare_exchange_weak_explicit(x, &expected, desired, memory_order_acquire, memory_order_relaxed)); The key thing to note is that when the comparison fails, compare-exchange updates the value of *expected with the value found. This update enables efficient and concise expression of computation of the new value. So, if in your function, the compare-exchange finds *x with a 1 the first iteration, it changes expected to 1, and it will likely find it 1 on the second iteration. Then on the second iteration, the compare-exchange will pass, even though you have not acquired the lock. You need to reset expected on each iteration if you use compare-exchange this way. Some compilers can eliminate that reassignment. I recommend that you do not code spin loops, they can cause really bad behavior in all but very tightly controled environments. > > void unlock_int(atomic_int *x) > { > atomic_store_explicit(x, 0, memory_order_release); > } > > void lock_flag(atomic_flag *x) > { > _Bool flag; > > do { > flag = atomic_flag_test_and_set_explicit( > x, > memory_order_acquire > ); > } while (flag); > } > > void unlock_flag(atomic_flag *x) > { > atomic_flag_clear_explicit(x, memory_order_release); > } > > Here is the generated assembler for PowerPC: > > .file "atomic.c" > .section ".text" > .align 2 > .globl lock_int > .type lock_int, @function > lock_int: > stwu 1,-24(1) > li 9,0 > li 8,1 > stw 9,8(1) > .L3: > lwarx 9,0,3 > lwz 10,8(1) > cmpw 0,9,10 > bne- 0,.L2 > stwcx. 8,0,3 > isync > .L2: > stw 9,8(1) > bne- 0,.L3 > addi 1,1,24 > blr > .size lock_int, .-lock_int > .align 2 > .globl unlock_int > .type unlock_int, @function > unlock_int: > li 9,0 > sync > stw 9,0(3) > blr > .size unlock_int, .-unlock_int > .align 2 > .globl lock_flag > .type lock_flag, @function > lock_flag: > rlwinm 5,3,3,27,28 > li 8,255 > xori 5,5,24 > li 11,1 > rlwinm 3,3,0,0,29 > li 4,0 > slw 8,8,5 > slw 11,11,5 > .L11: > lbz 7,0(4) > slw 7,7,5 > .L9: > lwarx 9,0,3 > and 6,9,8 > andc 10,9,8 > cmpw 0,6,7 > or 10,10,11 > bne- 0,.L10 > stwcx. 10,0,3 > bne- 0,.L9 > .L10: > isync > srw 9,9,5 > stb 9,0(4) > beq+ 0,.L11 > blr > .size lock_flag, .-lock_flag > .align 2 > .globl unlock_flag > .type unlock_flag, @function > unlock_flag: > li 9,0 > sync > stb 9,0(3) > blr > .size unlock_flag, .-unlock_flag > .ident "GCC: (GNU) 4.7.0 20120307 (prerelease)" > > The unlock_int() and unlock_flag() look quite nice. Also > lock_int() is quite good, only the lwz 10,8(1) inside the loop > seems to be a bit suboptimal. > > The lock_flag() is surprisingly complicated. The problem seems > to be that atomic_flag is a byte and word operations are used > to load and store it (lwarx and stwcx). This requires a lot > of bit operations. Why not use int as the type for atomic_flag > on PowerPC? I do not know why the implementors chose to do it that way, but there is a time/space trade off. -- Lawrence Crowl