Markus Henschel <markus.henschel@xxxxxxxx> writes: > │0x8048397 <main+71> xchg %ebx,%edi │ > >│0x8048399 <main+73> lock cmpxchg8b 0x2c(%ebx) │ > │0x80483a1 <main+81> xchg %ebx,%edi │ It looks like GCC 4.2.4 has used the wrong register in the cmpxchg8b instruction. The earlier part of this function uses EBX to point to the Global Offset Table. Because the cmpxchg8b instruction requires that the ECX:EBX register pair contains the 64-bit value to be stored in memory, GCC has temporarily moved the address of the GOT from EBX to EDI; so it should have used 0x2c(%edi) in the cmpxchg8b instruction. The generated code crashes because it tries to use the low 32 bits of the data value as the address of the GOT. A similar bug has already been reported: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37651 "__sync_bool_compare_and_swap creates wrong code with -fPIC" It too is about using %ebx as a pointer in cmpxchg8b. I can reproduce that bug with 4.2.4 but not with 4.4.6 or 4.6.2; so I guess the report could be closed now. Omitting -fPIC avoids the bug because accessing the foo variable then does not need the GOT and the cmpxchg8b instruction uses a literal address that does not involve any register. With your original source, omitting -fvisibility=hidden avoids the bug because it prevents foo_instance and bar from being inlined and the out-of-line code generated for bar accesses foo->_foobar via the pointer parameter, rather than via the GOT. With the simplified source, -fvisibility=hidden does not matter for the bug. For a workaround, I suggest int main(int argc, char * argv[]) { static long long foo; long long *volatile fooptr = &foo; return __sync_add_and_fetch(fooptr, foo); } as this seems to force GCC 4.2.4 to compute the address first, rather than try to access the variable via the GOT as part of the cmpxchg8b instruction.