> -----Original Message----- > Behalf Of Pavel Pavlov > Sent: Wednesday, June 16, 2010 12:40 > To: Andrew Haley > Cc: gcc-help@xxxxxxxxxxx > Subject: RE: Inline asm for ARM > > > -----Original Message----- > > From: Andrew Haley [mailto:aph@xxxxxxxxxx] On 06/16/2010 05:11 PM, > > Pavel Pavlov wrote: > > >> -----Original Message----- > > >> On 06/16/2010 01:15 PM, Andrew Haley wrote: > > >>> On 06/16/2010 11:23 AM, Pavel Pavlov wrote: > > > ... > > >> inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) { > > >> union > > >> { > > >> uint64_t ll; > > >> struct > > >> { > > >> unsigned int l; > > >> unsigned int h; > > >> } s; > > >> } retval; > > >> > > >> retval.ll = acc; > > >> > > >> __asm__("smlalbb %0, %1, %2, %3" > > >> : "+r"(retval.s.l), "+r"(retval.s.h) > > >> : "r"(lo), "r"(hi)); > > >> > > >> return retval.ll; > > >> } > > >> > > > > > > [Pavel Pavlov] > > > Later on I found out that I had to use +r constraint, but then, when > > > I use that > > function for example like that: > > > int64_t rsmlalbb64(int64_t i, int x, int y) { > > > return smlalbb64(i, x, y); > > > } > > > > > > Gcc generates this asm: > > > <rsmlalbb64>: > > > push {r4, r5} > > > mov r4, r0 > > > mov ip, r1 > > > smlalbb r4, ip, r2, r3 > > > mov r5, ip > > > mov r0, r4 > > > mov r1, ip > > > pop {r4, r5} > > > bx lr > > > > > > It's bizarre what gcc is doing in that function, I understand if it > > > can't optimize and correctly use r0 and r1 directly, but from that > > > listing it looks as if gcc got drunk and decided to touch r5 for > > > absolutely no reason! > > > > > > the expected out should have been like that: > > > <rsmlalbb64>: > > > smlalbb r0, r1, r2, r3 > > > bx lr > > > > > > I'm using cegcc 4.1.0 and I compile with > > > arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o > > > arm-mingw32ce-g++ ARM_TEST_GCC.obj > > > > > > Is there a way to access individual parts of that 64-bit input > > > integer or, is there a way to specify that two 32-bit integers > > > should be treated as a Hi:Lo parts of 64 bit variable. It's commonly > > > done with a temporary, but the result is that gcc generates to much junk. > > > > Why don't you just use the function I sent above? It generates > > > > smlalbb: > > smlalbb r0, r1, r2, r3 > > mov pc, lr > > > > smlalXX64: > > smlalbb r0, r1, r2, r3 > > smlalbt r0, r1, r2, r3 > > smlaltb r0, r1, r2, r3 > > smlaltt r0, r1, r2, r3 > > mov pc, lr > > > > [Pavel Pavlov] > What's your gcc -v? The output I posted comes from your function. By the way, the version that takes hi:lo for the first int64 works fine: static __inline void smlalbb(int * lo, int * hi, int x, int y) { #if defined(__CC_ARM) __asm { smlalbb *lo, *hi, x, y; } #elif defined(__GNUC__) __asm__ __volatile__("smlalbb %0, %1, %2, %3" : "+r"(*lo), "+r"(*hi) : "r"(x), "r"(y)); #endif } void test_smlalXX(int hi, int lo, int a, int b) { smlalbb(&hi, &lo, a, b); smlalbt(&hi, &lo, a, b); smlaltb(&hi, &lo, a, b); smlaltt(&hi, &lo, a, b); } Translates directly into four asm opcodes