I have come across this unusual scenario and am not exactly sure if it was a compiler bug that is now fixed or something is not quite right with the code. What happens in the following code is that the p array is stored in a 64 bit register (%rdx) when using optimization (like -O2) on gcc 4.1.2 and thus the inline assembly zero extends away the upper 32 bits when doing the first STORE32H and then the second STORE32H gets 0x00000000 for its value. Attached below is a stripped down test case and the resulting assembly code for the compiler and options. Modifiers that fix it for 4.1.2: use -m32, use no optimizations This works fine in gcc 4.2.3 on the same machine and on another linux OS (I haven't tried the newest gcc version yet). It optimizes the code but doesn't use the 64 bit register. My understanding of the inline assembly is that the compiler is responsible for knowing to protect the registers it dynamically uses, and therefore putting something in the clobber list doesn't help. I searched the gcc bugzilla extensively and haven't seen anything that specifically addresses this. It may have been fixed as a side effect of something else but I didn't want to file a bug since it works in a newer version. Thanks for any info, Derek Hardware: AMD Phenom Quad core 64 bit Test code: //************************************************* // test.c #include <stdio.h> typedef unsigned ulong32; #define STORE32H(x, y) \ asm __volatile__ ( \ "bswapl %0 \n\t" \ "movl %0,(%1)\n\t" \ "bswapl %0 \n\t" \ ::"r"(x), "r"(y)); static void pxor(ulong32 *p) { p[1] ^= p[0]; } int main(void) { ulong32 p[2] = {0x00010001, 0xaaaaaaaa}; unsigned char ctt[8]; pxor(p); STORE32H(p[0], ctt); STORE32H(p[1], ctt+4); //Should be 0x00010001 printf("ctt: 0x%02x%02x%02x%02x", ctt[0],ctt[1],ctt[2],ctt[3]); //Should be 0xaaabaaab printf(" 0x%02x%02x%02x%02x\n", ctt[4],ctt[5],ctt[6],ctt[7]); printf(" sizeof ulong32 should be 4: %lu\n", sizeof(ulong32)); return 0; } //******************************************************************8 GCC 4.1.2 gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20070925 (Red Hat 4.1.2-33) cc -g -Wall -W -O2 -c -o test.o test.c main: 0x00000000004004c0 <main+0>: sub $0x18,%rsp 0x00000000004004c4 <main+4>: mov $0xaaabaaab00010001,%rdx 0x00000000004004ce <main+14>: mov %rsp,%rax 0x00000000004004d1 <main+17>: bswap %edx 0x00000000004004d3 <main+19>: mov %edx,(%rsp) 0x00000000004004d6 <main+22>: bswap %edx 0x00000000004004d8 <main+24>: shr $0x20,%rdx 0x00000000004004dc <main+28>: add $0x4,%rax 0x00000000004004e0 <main+32>: bswap %edx 0x00000000004004e2 <main+34>: mov %edx,(%rax) 0x00000000004004e4 <main+36>: bswap %edx with GCC 4.2.3 /tmp/usr/local/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ./configure Thread model: posix gcc version 4.2.3 /tmp/usr/local/bin/gcc -g -Wall -W -O2 -c -o test.o test.c 0x0000000000400480 <main+0>: sub $0x18,%rsp 0x0000000000400484 <main+4>: mov $0x10001,%edx 0x0000000000400489 <main+9>: mov %rsp,%rax 0x000000000040048c <main+12>: bswap %edx 0x000000000040048e <main+14>: mov %edx,(%rsp) 0x0000000000400491 <main+17>: bswap %edx 0x0000000000400493 <main+19>: mov $0xaaabaaab,%edx 0x0000000000400498 <main+24>: add $0x4,%rax 0x000000000040049c <main+28>: bswap %edx 0x000000000040049e <main+30>: mov %edx,(%rax) 0x00000000004004a0 <main+32>: bswap %edx