> -----Original Message----- > From: Andrew Haley [mailto:aph@xxxxxxxxxx] > Sent: Wednesday, June 16, 2010 13:23 > To: Pavel Pavlov > Cc: gcc-help@xxxxxxxxxxx > Subject: Re: Inline asm for ARM > > On 06/16/2010 06:12 PM, Pavel Pavlov wrote: > >> From: gcc-help-owner@xxxxxxxxxxx [mailto:gcc-help-owner@xxxxxxxxxxx] > >> On Behalf Of Andrew Haley > >> > >>> By the way, the version that takes hi:lo for the first int64 works fine: > >>> > >>> static __inline void smlalbb(int * lo, int * hi, int x, int y) { #if > >>> defined(__CC_ARM) > >>> __asm { smlalbb *lo, *hi, x, y; } > >>> #elif defined(__GNUC__) > >>> __asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo), > >>> "+r"(*hi) > >>> : "r"(x), "r"(y)); #endif } > >>> > >>> > >>> void test_smlalXX(int hi, int lo, int a, int b) { > >>> smlalbb(&hi, &lo, a, b); > >>> smlalbt(&hi, &lo, a, b); > >>> smlaltb(&hi, &lo, a, b); > >>> smlaltt(&hi, &lo, a, b); > >>> } > >>> > >>> Translates directly into four asm opcodes > >> > >> Mmmm, but the volatile is wrong. If you need volatile to stop gcc > >> from deleting your asm, you have a mistake somewhere. > > > > I had to add volatile when I had that mess with "=&r" and "0", now I > > think it might be removed. > > > Just tested, and I still need that. The reason I needed that was > > because my test function was a noop: > > > void test_smlalXX(int lo, int hi, int a, int b) { > > smlalbb(&lo, &hi, a, b); > > smlalbt(&lo, &hi, a, b); > > smlaltb(&lo, &hi, a, b); > > smlaltt(&lo, &hi, a, b); > > } > > > Gcc correctly guesses that there is no side effect from that function > > if I don't use volatile. So, I removed volatile and added return for > > that function: > > > > uint64_t test_smlalXX(int lo, int hi, int a, int b) { > > smlalbb(&lo, &hi, a, b); > > smlalbt(&lo, &hi, a, b); > > smlaltb(&lo, &hi, a, b); > > smlaltt(&lo, &hi, a, b); > > > > T64 retval; > > > > retval.s.hi = hi; > > retval.s.lo = lo; > > return retval.i64; > > } > > > > The output becomes: > > 000000e4 <_Z12test_smlalXXiiii>: > > e4: e92d0030 push {r4, r5} > > e8: e1410382 smlalbb r0, r1, r2, r3 > > ec: e14103c2 smlalbt r0, r1, r2, r3 > > f0: e14103a2 smlaltb r0, r1, r2, r3 > > f4: e1a05001 mov r5, r1 > > f8: e14503e2 smlaltt r0, r5, r2, r3 > > fc: e1a04000 mov r4, r0 > > 100: e1a01005 mov r1, r5 > > 104: e8bd0030 pop {r4, r5} > > 108: e12fff1e bx lr > > > > Basically gcc, gets confused about return variable and generates > > useless gunk at the end for the last function. I tried to comment > > smlaltt(&lo, &hi, a, b); in the test_smlalXX, and gcc still generates > > that same useless code around smlattb > > I have seen something similar with higher optimization levels, where some pass > messes things up a bit. Your > > mov r4, r0 > > is very weird, though. I can't explain that. > > -O1 generates perfect code for me, though. > > Andrew. [Pavel Pavlov] That's similar to that bizarre listing I sent previously, I can't explain what's happening it just puts some code that has no meaning at all. -O1, -O2 and-O3 generate identical results for me.