RE: Inline asm for ARM

Pavel Pavlov <pavel@xxxxxxxxxxxxxx> · Wed, 16 Jun 2010 13:26:52 -0400

> -----Original Message-----
> From: Andrew Haley [mailto:aph@xxxxxxxxxx]
> Sent: Wednesday, June 16, 2010 13:23
> To: Pavel Pavlov
> Cc: gcc-help@xxxxxxxxxxx
> Subject: Re: Inline asm for ARM
> 
> On 06/16/2010 06:12 PM, Pavel Pavlov wrote:
> >> From: gcc-help-owner@xxxxxxxxxxx [mailto:gcc-help-owner@xxxxxxxxxxx]
> >> On Behalf Of Andrew Haley
> >>
> >>> By the way, the version that takes hi:lo for the first int64 works fine:
> >>>
> >>> static __inline void smlalbb(int * lo, int * hi, int x, int y) { #if
> >>> defined(__CC_ARM)
> >>> 	__asm { smlalbb *lo, *hi, x, y; }
> >>> #elif defined(__GNUC__)
> >>> 	__asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo),
> >>> "+r"(*hi)
> >>> : "r"(x), "r"(y)); #endif }
> >>>
> >>>
> >>> void test_smlalXX(int hi, int lo, int a, int b) {
> >>> 	smlalbb(&hi, &lo, a, b);
> >>> 	smlalbt(&hi, &lo, a, b);
> >>> 	smlaltb(&hi, &lo, a, b);
> >>> 	smlaltt(&hi, &lo, a, b);
> >>> }
> >>>
> >>> Translates directly into four asm opcodes
> >>
> >> Mmmm, but the volatile is wrong.  If you need volatile to stop gcc
> >> from deleting your asm, you have a mistake somewhere.
> >
> > I had to add volatile when I had that mess with "=&r" and "0", now I
> > think it might be removed.
> 
> > Just tested, and I still need that. The reason I needed that was
> > because my test function was a noop:
> 
> > void test_smlalXX(int lo, int hi, int a, int b) {
> > 	smlalbb(&lo, &hi, a, b);
> > 	smlalbt(&lo, &hi, a, b);
> > 	smlaltb(&lo, &hi, a, b);
> > 	smlaltt(&lo, &hi, a, b);
> > }
> 
> > Gcc correctly guesses that there is no side effect from that function
> > if I don't use volatile.  So, I removed volatile and added return for
> > that function:
> >
> > uint64_t test_smlalXX(int lo, int hi, int a, int b) {
> > 	smlalbb(&lo, &hi, a, b);
> > 	smlalbt(&lo, &hi, a, b);
> > 	smlaltb(&lo, &hi, a, b);
> > 	smlaltt(&lo, &hi, a, b);
> >
> > 	T64 retval;
> >
> > 	retval.s.hi = hi;
> > 	retval.s.lo = lo;
> > 	return retval.i64;
> > }
> >
> > The output becomes:
> > 000000e4 <_Z12test_smlalXXiiii>:
> >   e4:	e92d0030 	push	{r4, r5}
> >   e8:	e1410382 	smlalbb	r0, r1, r2, r3
> >   ec:	e14103c2 	smlalbt	r0, r1, r2, r3
> >   f0:	e14103a2 	smlaltb	r0, r1, r2, r3
> >   f4:	e1a05001 	mov	r5, r1
> >   f8:	e14503e2 	smlaltt	r0, r5, r2, r3
> >   fc:	e1a04000 	mov	r4, r0
> >  100:	e1a01005 	mov	r1, r5
> >  104:	e8bd0030 	pop	{r4, r5}
> >  108:	e12fff1e 	bx	lr
> >
> > Basically gcc, gets confused about return variable and generates
> > useless gunk at the end for the last function. I tried to comment
> > smlaltt(&lo, &hi, a, b); in the test_smlalXX, and gcc still generates
> > that same useless code around smlattb
> 
> I have seen something similar with higher optimization levels, where some pass
> messes things up a bit.  Your
> 
>  	mov	r4, r0
> 
> is very weird, though.  I can't explain that.
> 
> -O1 generates perfect code for me, though.
> 
> Andrew.

[Pavel Pavlov] 
That's similar to that bizarre listing I sent previously, I can't explain what's happening it just puts some code that has no meaning at all. -O1, -O2 and-O3 generate identical results for me.