On 06/16/2010 05:54 PM, Pavel Pavlov wrote: >> -----Original Message----- >> Behalf Of Pavel Pavlov >> Sent: Wednesday, June 16, 2010 12:40 >> To: Andrew Haley >> Cc: gcc-help@xxxxxxxxxxx >> Subject: RE: Inline asm for ARM >> >>> -----Original Message----- >>> From: Andrew Haley [mailto:aph@xxxxxxxxxx] On 06/16/2010 05:11 PM, >>> Pavel Pavlov wrote: >>>>> -----Original Message----- >>>>> On 06/16/2010 01:15 PM, Andrew Haley wrote: >>>>>> On 06/16/2010 11:23 AM, Pavel Pavlov wrote: >>>> ... >>>>> inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) { >>>>> union >>>>> { >>>>> uint64_t ll; >>>>> struct >>>>> { >>>>> unsigned int l; >>>>> unsigned int h; >>>>> } s; >>>>> } retval; >>>>> >>>>> retval.ll = acc; >>>>> >>>>> __asm__("smlalbb %0, %1, %2, %3" >>>>> : "+r"(retval.s.l), "+r"(retval.s.h) >>>>> : "r"(lo), "r"(hi)); >>>>> >>>>> return retval.ll; >>>>> } >>>>> >>>> >>>> [Pavel Pavlov] >>>> Later on I found out that I had to use +r constraint, but then, when >>>> I use that >>> function for example like that: >>>> int64_t rsmlalbb64(int64_t i, int x, int y) { >>>> return smlalbb64(i, x, y); >>>> } >>>> >>>> Gcc generates this asm: >>>> <rsmlalbb64>: >>>> push {r4, r5} >>>> mov r4, r0 >>>> mov ip, r1 >>>> smlalbb r4, ip, r2, r3 >>>> mov r5, ip >>>> mov r0, r4 >>>> mov r1, ip >>>> pop {r4, r5} >>>> bx lr >>>> >>>> It's bizarre what gcc is doing in that function, I understand if it >>>> can't optimize and correctly use r0 and r1 directly, but from that >>>> listing it looks as if gcc got drunk and decided to touch r5 for >>>> absolutely no reason! >>>> >>>> the expected out should have been like that: >>>> <rsmlalbb64>: >>>> smlalbb r0, r1, r2, r3 >>>> bx lr >>>> >>>> I'm using cegcc 4.1.0 and I compile with >>>> arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o >>>> arm-mingw32ce-g++ ARM_TEST_GCC.obj >>>> >>>> Is there a way to access individual parts of that 64-bit input >>>> integer or, is there a way to specify that two 32-bit integers >>>> should be treated as a Hi:Lo parts of 64 bit variable. It's commonly >>>> done with a temporary, but the result is that gcc generates to much junk. >>> >>> Why don't you just use the function I sent above? It generates >>> >>> smlalbb: >>> smlalbb r0, r1, r2, r3 >>> mov pc, lr >>> >>> smlalXX64: >>> smlalbb r0, r1, r2, r3 >>> smlalbt r0, r1, r2, r3 >>> smlaltb r0, r1, r2, r3 >>> smlaltt r0, r1, r2, r3 >>> mov pc, lr >>> >> >> [Pavel Pavlov] >> What's your gcc -v? The output I posted comes from your function. > > By the way, the version that takes hi:lo for the first int64 works fine: > > static __inline void smlalbb(int * lo, int * hi, int x, int y) > { > #if defined(__CC_ARM) > __asm { smlalbb *lo, *hi, x, y; } > #elif defined(__GNUC__) > __asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo), "+r"(*hi) : "r"(x), "r"(y)); > #endif > } > > > void test_smlalXX(int hi, int lo, int a, int b) > { > smlalbb(&hi, &lo, a, b); > smlalbt(&hi, &lo, a, b); > smlaltb(&hi, &lo, a, b); > smlaltt(&hi, &lo, a, b); > } > > Translates directly into four asm opcodes Mmmm, but the volatile is wrong. If you need volatile to stop gcc from deleting your asm, you have a mistake somewhere. Andrew.