Re: How to force an ADD asm instruction (x64)?

Jeffrey Walton <noloader@xxxxxxxxx> · Thu, 16 Jun 2011 16:17:50 -0400



On Thu, Jun 16, 2011 at 2:58 PM, Jeffrey Walton <noloader@xxxxxxxxx> wrote:
> Hi All,
>
> I observed unexpected results while testing an an optimized x64
> routine (using -g and -O2). It seems on x86_64, adds can be optimized
> as follows (using LEA):
>
> # Adding 10 + 10
> ...
> 0x0000000100000d44 <main+20>:   mov    esi,0xa
> 0x0000000100000d49 <main+25>:   mov    edi,0xa
> 0x0000000100000d4e <main+30>:   call   0x100000ce0 <sadd>
>
> #    int sadd(int a, int b, int* r);;
> #    result is really a bool, {pro|epi}logue omitted
> 0x0000000100000ce4 <sadd+4>:    jo     0x100000ce8 <sadd+8>
> 0x0000000100000ce6 <sadd+6>:    jmp    0x100000ce8 <sadd+8>
> 0x0000000100000ce8 <sadd+8>:    test   rdx,rdx
> 0x0000000100000ceb <sadd+11>:   je     0x100000cf2 <sadd+18>
> 0x0000000100000ced <sadd+13>:   lea    eax,[rsi+rdi]
> 0x0000000100000cf0 <sadd+16>:   mov    DWORD PTR [rdx],eax
> 0x0000000100000cf2 <sadd+18>:   xor    eax,eax
>
> It seems an add is performed via the load effective address:
>    lea    eax,[rsi+rdi]
>
> Worse, the test for overflow is performed before the add:
>    jo     0x100000ce8 <sadd+8>
>    ....
>    lea    eax,[rsi+rdi]
>
> Any ideas on what knob turning I should perform? The C/C++ source for
> `sadd` is below. There's not much to it - just a test on an x86 flag.
>
It looks like overflow and result needed to be declared as volatile to
enforce ordering:

int add_i32(int a, int b, int* r)
{	
	volatile int overflow = 0;
	volatile int result = a + b;
	
	asm volatile("jo 1f");
	asm volatile("jmp 2f");
	
	asm volatile("1:");
	overflow = 1;
	
	asm volatile("2:");
	if(r)
		*r = result;
	
	return !overflow;
}