> -----Original Message----- > Behalf Of Andrew Haley > Sent: Wednesday, June 16, 2010 13:00 > To: gcc-help@xxxxxxxxxxx > Subject: Re: Inline asm for ARM > > On 06/16/2010 05:57 PM, Pavel Pavlov wrote: > > > > > >> -----Original Message----- > >> From: gcc-help-owner@xxxxxxxxxxx [mailto:gcc-help-owner@xxxxxxxxxxx] > >> On Behalf Of Andrew Haley > >> Sent: Wednesday, June 16, 2010 12:52 > >> To: gcc-help@xxxxxxxxxxx > >> Subject: Re: Inline asm for ARM > >> > >> On 06/16/2010 05:40 PM, Pavel Pavlov wrote: > >>>> -----Original Message----- > >>>> From: Andrew Haley [mailto:aph@xxxxxxxxxx] On 06/16/2010 05:11 PM, > >>>> Pavel Pavlov wrote: > >>>>>> -----Original Message----- > >>>>>> On 06/16/2010 01:15 PM, Andrew Haley wrote: > >>>>>>> On 06/16/2010 11:23 AM, Pavel Pavlov wrote: > >>>>> ... > >>>>>> inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) { > >>>>>> union > >>>>>> { > >>>>>> uint64_t ll; > >>>>>> struct > >>>>>> { > >>>>>> unsigned int l; > >>>>>> unsigned int h; > >>>>>> } s; > >>>>>> } retval; > >>>>>> > >>>>>> retval.ll = acc; > >>>>>> > >>>>>> __asm__("smlalbb %0, %1, %2, %3" > >>>>>> : "+r"(retval.s.l), "+r"(retval.s.h) > >>>>>> : "r"(lo), "r"(hi)); > >>>>>> > >>>>>> return retval.ll; > >>>>>> } > >>>>>> > >>>>> > >>>>> [Pavel Pavlov] > >>>>> Later on I found out that I had to use +r constraint, but then, > >>>>> when I use that > >>>> function for example like that: > >>>>> int64_t rsmlalbb64(int64_t i, int x, int y) { > >>>>> return smlalbb64(i, x, y); > >>>>> } > >>>>> > >>>>> Gcc generates this asm: > >>>>> <rsmlalbb64>: > >>>>> push {r4, r5} > >>>>> mov r4, r0 > >>>>> mov ip, r1 > >>>>> smlalbb r4, ip, r2, r3 > >>>>> mov r5, ip > >>>>> mov r0, r4 > >>>>> mov r1, ip > >>>>> pop {r4, r5} > >>>>> bx lr > >>>>> > >>>>> It's bizarre what gcc is doing in that function, I understand if > >>>>> it can't optimize and correctly use r0 and r1 directly, but from > >>>>> that listing it looks as if gcc got drunk and decided to touch r5 > >>>>> for absolutely no reason! > >>>>> > >>>>> the expected out should have been like that: > >>>>> <rsmlalbb64>: > >>>>> smlalbb r0, r1, r2, r3 > >>>>> bx lr > >>>>> > >>>>> I'm using cegcc 4.1.0 and I compile with > >>>>> arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o > >>>>> arm-mingw32ce-g++ ARM_TEST_GCC.obj > >>>>> > >>>>> Is there a way to access individual parts of that 64-bit input > >>>>> integer or, is there a way to specify that two 32-bit integers > >>>>> should be treated as a Hi:Lo parts of 64 bit variable. It's > >>>>> commonly done with a temporary, but the result is that gcc generates to > much junk. > >>>> > >>>> Why don't you just use the function I sent above? It generates > >>>> > >>>> smlalbb: > >>>> smlalbb r0, r1, r2, r3 > >>>> mov pc, lr > >>>> > >>>> smlalXX64: > >>>> smlalbb r0, r1, r2, r3 > >>>> smlalbt r0, r1, r2, r3 > >>>> smlaltb r0, r1, r2, r3 > >>>> smlaltt r0, r1, r2, r3 > >>>> mov pc, lr > >>>> > >>> > >>> [Pavel Pavlov] > >>> What's your gcc -v? The output I posted comes from your function. > >> > >> 4.3.0 > >> > >> Perhaps your compiler options were wrong? Dunno. > >> > > > > > > [Pavel Pavlov] > > It's kind of difficult ot get that part wrong :) > > It's not. Trust me, I have been on gcc-help for _long_ while... > > I've even seen complains about poor code when optimization is disabled. > > Andrew. > > > Andrew. > > > > I saw that there are some changes between 4.1.0 and 4.3.0 in arm code, > optimizer code might have been improved between the two versions as well. So, > I'm building 4.4.0 now to see if it fixes the problem. [Pavel Pavlov] Well, off course I enable optimization. -O3 I suppose is enough for this simple case. That's why I said that it's difficult to get that wrong. Without optimizations it would generate something quite different (without inlining etc)