On 29/06/17 09:47, Andrew Haley wrote: > Exactly. The fact that people can mess up is no excuse for GCC not > providing an intrinsic for double-word CAS. So, I was going to write originally that DWCAS *is* provided, and that what is missing is *only* the compiler predefine for _SYNC_16. However, I've just been playing around with a bit of test code and looking as assembly output and I am now utterly confused. This is a good thing, because before I was totally confused *but didn't know it* :-) It looks like libatomic is for DWCAS emitting a non-atomic DWCAS. (In which case, if in fact libatomic does not support DWCAS, then the lack of _SYNC_16 looks correct!) I'm testing on an aarch64, and this is the test code (where I vary the type of the variables to perform different lengths of CAS); #include <stdio.h> #include <stdlib.h> int main( void ); int main() { __int128 __attribute__ ( (aligned(16)) ) target = 1, compare = 1, exchange = 2; __atomic_compare_exchange_n( &target, &compare, exchange, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST ); printf( "%d\n", (int) target ); return( EXIT_SUCCESS ); } This is how I dump assembly; objdump -d -M -S a.out Now, I don't know ARM assembly, so it is certainly the case I'm just not understanding what I'm looking it; please keep that in mind and be forgiving! So, starting with the OS provided GCC 4.9.2; 1. calling GCC without -latomic fails with test.c:(.text+0x50): undefined reference to `__atomic_compare_exchange_16' 2. calling GCC with -latomic compiles and gives this in the disassembly; 0000000000400500 <__atomic_compare_exchange_16@plt-0x20>: 400500: a9bf7bf0 stp x16, x30, [sp,#-16]! 400504: 90000090 adrp x16, 410000 <__FRAME_END__+0xf840> 400508: f944f211 ldr x17, [x16,#2528] 40050c: 91278210 add x16, x16, #0x9e0 400510: d61f0220 br x17 400514: d503201f nop 400518: d503201f nop 40051c: d503201f nop and the following in main; 400700: 97ffff88 bl 400520 <__atomic_compare_exchange_16@plt> (With I believe the usual initial PLT fix-up occurring, which is why the main listing above is at -0x20.) There are as far as I can tell in the entire disassembly (and certainly not in the function call above) any use of the "X" type load/store instructions, which are the type for load-linked/store-conditional. In other words, it seems to be a non-atomic DWCAS and indeed *nothing* atomic is going on, anywhere (but there is here a very high chance I've just *missed* it, since I don't know ARM). 3. changing the code to use int long long unsigned (64-bit), staying with 4.9.2, I can link and compile without -latomic. This gives the following disassembly which is in main proper and looks right; 400650: c85ffc41 ldaxr x1, [x2] 400654: eb03003f cmp x1, x3 400658: 54000061 b.ne 400664 <main+0x44> 40065c: c805fc44 stlxr w5, x4, [x2] 400660: 35ffff85 cbnz w5, 400650 <main+0x30> 4. I have the same outcome with my hand-built GCC 7.1.0.