On 06/16/2010 01:15 PM, Andrew Haley wrote: > On 06/16/2010 11:23 AM, Pavel Pavlov wrote: >> I spent hours to get it working properly, but it seems that I can't find a way to do it properly. >> In arm 5te, there is an instruction SMLALBB http://bit.ly/amvRVv >> SMLALBB RdLo, RdHi, Rm, Rs >> Multiples bottom 16 bits of Rm by bottom 16 bits of Rs and adds 32 bit result to 64 bit integer represented by a pair of register RdLo, RdHi. >> So, I tried everything I can and it seems that I can't get it working. >> >> The closest try was: >> static __inline void smlalbb(int * lo, int * hi, int x, int y) >> { >> __asm__ __volatile__("smlalbb %0, %1, %2, %3" : "=&r"(lo), "=&r"(hi) : "r"(x), "r"(y), "0"(lo), "1"(hi)); >> } >> >> It seem to produce correct result, but that worked only for simple test function, if I chained calls to this smlalbb function the results weren't correct anymore. >> >> The correct way would probably have to use (*lo) and (*hi) as part of register lists, but in that case it adds too many useless loads and stores (instead of translating directly to a single asm instruction it would generate like 8-10 instructions). > > I think it should be Sorry, my mistake. inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) { union { uint64_t ll; struct { unsigned int l; unsigned int h; } s; } retval; retval.ll = acc; __asm__("smlalbb %0, %1, %2, %3" : "+r"(retval.s.l), "+r"(retval.s.h) : "r"(lo), "r"(hi)); return retval.ll; } Andrew.