> -----Original Message----- > On 06/16/2010 01:15 PM, Andrew Haley wrote: > > On 06/16/2010 11:23 AM, Pavel Pavlov wrote: ... > inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) { > union > { > uint64_t ll; > struct > { > unsigned int l; > unsigned int h; > } s; > } retval; > > retval.ll = acc; > > __asm__("smlalbb %0, %1, %2, %3" > : "+r"(retval.s.l), "+r"(retval.s.h) > : "r"(lo), "r"(hi)); > > return retval.ll; > } > [Pavel Pavlov] Later on I found out that I had to use +r constraint, but then, when I use that function for example like that: int64_t rsmlalbb64(int64_t i, int x, int y) { return smlalbb64(i, x, y); } Gcc generates this asm: <rsmlalbb64>: push {r4, r5} mov r4, r0 mov ip, r1 smlalbb r4, ip, r2, r3 mov r5, ip mov r0, r4 mov r1, ip pop {r4, r5} bx lr It's bizarre what gcc is doing in that function, I understand if it can't optimize and correctly use r0 and r1 directly, but from that listing it looks as if gcc got drunk and decided to touch r5 for absolutely no reason! the expected out should have been like that: <rsmlalbb64>: smlalbb r0, r1, r2, r3 bx lr I'm using cegcc 4.1.0 and I compile with arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o ARM_TEST_GCC.obj Is there a way to access individual parts of that 64-bit input integer or, is there a way to specify that two 32-bit integers should be treated as a Hi:Lo parts of 64 bit variable. It's commonly done with a temporary, but the result is that gcc generates to much junk.