On 28 May 2017 at 12:15, Toebs Douglass wrote: > I would like to write a little about the new libatomic mechanism for > What I see however is that there is a way for me to avoid these costs > and return to the simple situation. My code has an abstraction layer, > and I can implement inline assembly for double-word CAS on 64-bit > platforms and use that instead of __atomic and __sync. On x86_64 can't you just use __sync_val_compare_and_swap with -mcx16? Since GCC 4.6 this always emits cmpxchg16b when compiled with -mcx16: int main() { __int128 i = 0; __sync_val_compare_and_swap(&i, 0, 1); } That still works with GCC 7.