> -----Original Message----- > From: gcc-help-owner@xxxxxxxxxxx [mailto:gcc-help-owner@xxxxxxxxxxx] On > Behalf Of Andrew Haley > Sent: Wednesday, June 16, 2010 12:58 > To: gcc-help@xxxxxxxxxxx > Subject: Re: Inline asm for ARM > > On 06/16/2010 05:54 PM, Pavel Pavlov wrote: > >> -----Original Message----- > >> Behalf Of Pavel Pavlov > >> Sent: Wednesday, June 16, 2010 12:40 > >> To: Andrew Haley > >> Cc: gcc-help@xxxxxxxxxxx > >> Subject: RE: Inline asm for ARM > >> > >>> -----Original Message----- > >>> From: Andrew Haley [mailto:aph@xxxxxxxxxx] On 06/16/2010 05:11 PM, > >>> Pavel Pavlov wrote: > >>>>> -----Original Message----- > >>>>> On 06/16/2010 01:15 PM, Andrew Haley wrote: > >>>>>> On 06/16/2010 11:23 AM, Pavel Pavlov wrote: > >>>> ... > >>>>> inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) { > >>>>> union > >>>>> { > >>>>> uint64_t ll; > >>>>> struct > >>>>> { > >>>>> unsigned int l; > >>>>> unsigned int h; > >>>>> } s; > >>>>> } retval; > >>>>> > >>>>> retval.ll = acc; > >>>>> > >>>>> __asm__("smlalbb %0, %1, %2, %3" > >>>>> : "+r"(retval.s.l), "+r"(retval.s.h) > >>>>> : "r"(lo), "r"(hi)); > >>>>> > >>>>> return retval.ll; > >>>>> } > >>>>> > >>>> > >>>> [Pavel Pavlov] > >>>> Later on I found out that I had to use +r constraint, but then, > >>>> when I use that > >>> function for example like that: > >>>> int64_t rsmlalbb64(int64_t i, int x, int y) { > >>>> return smlalbb64(i, x, y); > >>>> } > >>>> > >>>> Gcc generates this asm: > >>>> <rsmlalbb64>: > >>>> push {r4, r5} > >>>> mov r4, r0 > >>>> mov ip, r1 > >>>> smlalbb r4, ip, r2, r3 > >>>> mov r5, ip > >>>> mov r0, r4 > >>>> mov r1, ip > >>>> pop {r4, r5} > >>>> bx lr > >>>> > >>>> It's bizarre what gcc is doing in that function, I understand if it > >>>> can't optimize and correctly use r0 and r1 directly, but from that > >>>> listing it looks as if gcc got drunk and decided to touch r5 for > >>>> absolutely no reason! > >>>> > >>>> the expected out should have been like that: > >>>> <rsmlalbb64>: > >>>> smlalbb r0, r1, r2, r3 > >>>> bx lr > >>>> > >>>> I'm using cegcc 4.1.0 and I compile with > >>>> arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o > >>>> arm-mingw32ce-g++ ARM_TEST_GCC.obj > >>>> > >>>> Is there a way to access individual parts of that 64-bit input > >>>> integer or, is there a way to specify that two 32-bit integers > >>>> should be treated as a Hi:Lo parts of 64 bit variable. It's > >>>> commonly done with a temporary, but the result is that gcc generates to > much junk. > >>> > >>> Why don't you just use the function I sent above? It generates > >>> > >>> smlalbb: > >>> smlalbb r0, r1, r2, r3 > >>> mov pc, lr > >>> > >>> smlalXX64: > >>> smlalbb r0, r1, r2, r3 > >>> smlalbt r0, r1, r2, r3 > >>> smlaltb r0, r1, r2, r3 > >>> smlaltt r0, r1, r2, r3 > >>> mov pc, lr > >>> > >> > >> [Pavel Pavlov] > >> What's your gcc -v? The output I posted comes from your function. > > > > By the way, the version that takes hi:lo for the first int64 works fine: > > > > static __inline void smlalbb(int * lo, int * hi, int x, int y) { #if > > defined(__CC_ARM) > > __asm { smlalbb *lo, *hi, x, y; } > > #elif defined(__GNUC__) > > __asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo), "+r"(*hi) > > : "r"(x), "r"(y)); #endif } > > > > > > void test_smlalXX(int hi, int lo, int a, int b) { > > smlalbb(&hi, &lo, a, b); > > smlalbt(&hi, &lo, a, b); > > smlaltb(&hi, &lo, a, b); > > smlaltt(&hi, &lo, a, b); > > } > > > > Translates directly into four asm opcodes > > Mmmm, but the volatile is wrong. If you need volatile to stop gcc from deleting > your asm, you have a mistake somewhere. > > Andrew. I had to add volatile when I had that mess with "=&r" and "0", now I think it might be removed. Just tested, and I still need that. The reason I needed that was because my test function was a noop: void test_smlalXX(int lo, int hi, int a, int b) { smlalbb(&lo, &hi, a, b); smlalbt(&lo, &hi, a, b); smlaltb(&lo, &hi, a, b); smlaltt(&lo, &hi, a, b); } Gcc correctly guesses that there is no side effect from that function if I don't use volatile. So, I removed volatile and added return for that function: uint64_t test_smlalXX(int lo, int hi, int a, int b) { smlalbb(&lo, &hi, a, b); smlalbt(&lo, &hi, a, b); smlaltb(&lo, &hi, a, b); smlaltt(&lo, &hi, a, b); T64 retval; retval.s.hi = hi; retval.s.lo = lo; return retval.i64; } The output becomes: 000000e4 <_Z12test_smlalXXiiii>: e4: e92d0030 push {r4, r5} e8: e1410382 smlalbb r0, r1, r2, r3 ec: e14103c2 smlalbt r0, r1, r2, r3 f0: e14103a2 smlaltb r0, r1, r2, r3 f4: e1a05001 mov r5, r1 f8: e14503e2 smlaltt r0, r5, r2, r3 fc: e1a04000 mov r4, r0 100: e1a01005 mov r1, r5 104: e8bd0030 pop {r4, r5} 108: e12fff1e bx lr Basically gcc, gets confused about return variable and generates useless gunk at the end for the last function. I tried to comment smlaltt(&lo, &hi, a, b); in the test_smlalXX, and gcc still generates that same useless code around smlattb