On Thu, Jun 16, 2011 at 2:58 PM, Jeffrey Walton <noloader@xxxxxxxxx> wrote: > Hi All, > > I observed unexpected results while testing an an optimized x64 > routine (using -g and -O2). It seems on x86_64, adds can be optimized > as follows (using LEA): > > # Adding 10 + 10 > ... > 0x0000000100000d44 <main+20>: mov esi,0xa > 0x0000000100000d49 <main+25>: mov edi,0xa > 0x0000000100000d4e <main+30>: call 0x100000ce0 <sadd> > > # int sadd(int a, int b, int* r);; > # result is really a bool, {pro|epi}logue omitted > 0x0000000100000ce4 <sadd+4>: jo 0x100000ce8 <sadd+8> > 0x0000000100000ce6 <sadd+6>: jmp 0x100000ce8 <sadd+8> > 0x0000000100000ce8 <sadd+8>: test rdx,rdx > 0x0000000100000ceb <sadd+11>: je 0x100000cf2 <sadd+18> > 0x0000000100000ced <sadd+13>: lea eax,[rsi+rdi] > 0x0000000100000cf0 <sadd+16>: mov DWORD PTR [rdx],eax > 0x0000000100000cf2 <sadd+18>: xor eax,eax > > It seems an add is performed via the load effective address: > lea eax,[rsi+rdi] > > Worse, the test for overflow is performed before the add: > jo 0x100000ce8 <sadd+8> > .... > lea eax,[rsi+rdi] > > Any ideas on what knob turning I should perform? The C/C++ source for > `sadd` is below. There's not much to it - just a test on an x86 flag. > It looks like overflow and result needed to be declared as volatile to enforce ordering: int add_i32(int a, int b, int* r) { volatile int overflow = 0; volatile int result = a + b; asm volatile("jo 1f"); asm volatile("jmp 2f"); asm volatile("1:"); overflow = 1; asm volatile("2:"); if(r) *r = result; return !overflow; }