Re: Inline asm for ARM

Andrew Haley <aph@xxxxxxxxxx> · Wed, 16 Jun 2010 18:22:36 +0100

On 06/16/2010 06:12 PM, Pavel Pavlov wrote:
>> From: gcc-help-owner@xxxxxxxxxxx [mailto:gcc-help-owner@xxxxxxxxxxx] On
>> Behalf Of Andrew Haley
>>
>>> By the way, the version that takes hi:lo for the first int64 works fine:
>>>
>>> static __inline void smlalbb(int * lo, int * hi, int x, int y) { #if
>>> defined(__CC_ARM)
>>> 	__asm { smlalbb *lo, *hi, x, y; }
>>> #elif defined(__GNUC__)
>>> 	__asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo), "+r"(*hi)
>>> : "r"(x), "r"(y)); #endif }
>>>
>>>
>>> void test_smlalXX(int hi, int lo, int a, int b) {
>>> 	smlalbb(&hi, &lo, a, b);
>>> 	smlalbt(&hi, &lo, a, b);
>>> 	smlaltb(&hi, &lo, a, b);
>>> 	smlaltt(&hi, &lo, a, b);
>>> }
>>>
>>> Translates directly into four asm opcodes
>>
>> Mmmm, but the volatile is wrong.  If you need volatile to stop gcc
>> from deleting your asm, you have a mistake somewhere.
> 
> I had to add volatile when I had that mess with "=&r" and "0", now I
> think it might be removed.

> Just tested, and I still need that. The reason I needed that was
> because my test function was a noop:

> void test_smlalXX(int lo, int hi, int a, int b)
> {
> 	smlalbb(&lo, &hi, a, b);
> 	smlalbt(&lo, &hi, a, b);
> 	smlaltb(&lo, &hi, a, b);
> 	smlaltt(&lo, &hi, a, b);
> }

> Gcc correctly guesses that there is no side effect from that
> function if I don't use volatile.  So, I removed volatile and added
> return for that function:
> 
> uint64_t test_smlalXX(int lo, int hi, int a, int b)
> {
> 	smlalbb(&lo, &hi, a, b);
> 	smlalbt(&lo, &hi, a, b);
> 	smlaltb(&lo, &hi, a, b);
> 	smlaltt(&lo, &hi, a, b);
> 
> 	T64 retval;
> 	
> 	retval.s.hi = hi;
> 	retval.s.lo = lo;
> 	return retval.i64;
> }
> 
> The output becomes:
> 000000e4 <_Z12test_smlalXXiiii>:
>   e4:	e92d0030 	push	{r4, r5}
>   e8:	e1410382 	smlalbb	r0, r1, r2, r3
>   ec:	e14103c2 	smlalbt	r0, r1, r2, r3
>   f0:	e14103a2 	smlaltb	r0, r1, r2, r3
>   f4:	e1a05001 	mov	r5, r1
>   f8:	e14503e2 	smlaltt	r0, r5, r2, r3
>   fc:	e1a04000 	mov	r4, r0
>  100:	e1a01005 	mov	r1, r5
>  104:	e8bd0030 	pop	{r4, r5}
>  108:	e12fff1e 	bx	lr
> 
> Basically gcc, gets confused about return variable and generates
> useless gunk at the end for the last function. I tried to comment
> smlaltt(&lo, &hi, a, b); in the test_smlalXX, and gcc still
> generates that same useless code around smlattb

I have seen something similar with higher optimization levels, where
some pass messes things up a bit.  Your

 	mov	r4, r0

is very weird, though.  I can't explain that.

-O1 generates perfect code for me, though.

Andrew.