RE: Inline asm for ARM

Pavel Pavlov <pavel@xxxxxxxxxxxxxx> · Wed, 16 Jun 2010 12:55:11 -0400



> -----Original Message-----
> From: gcc-help-owner@xxxxxxxxxxx [mailto:gcc-help-owner@xxxxxxxxxxx] On
> Behalf Of Andrew Haley
> Sent: Wednesday, June 16, 2010 12:52
> To: gcc-help@xxxxxxxxxxx
> Subject: Re: Inline asm for ARM
> 
> On 06/16/2010 05:40 PM, Pavel Pavlov wrote:
> >> -----Original Message-----
> >> From: Andrew Haley [mailto:aph@xxxxxxxxxx] On 06/16/2010 05:11 PM,
> >> Pavel Pavlov wrote:
> >>>> -----Original Message-----
> >>>> On 06/16/2010 01:15 PM, Andrew Haley wrote:
> >>>>> On 06/16/2010 11:23 AM, Pavel Pavlov wrote:
> >>> ...
> >>>> inline uint64_t smlalbb(uint64_t acc, unsigned int lo, unsigned int hi) {
> >>>>   union
> >>>>   {
> >>>>     uint64_t ll;
> >>>>     struct
> >>>>     {
> >>>>       unsigned int l;
> >>>>       unsigned int h;
> >>>>     } s;
> >>>>   } retval;
> >>>>
> >>>>   retval.ll = acc;
> >>>>
> >>>>   __asm__("smlalbb %0, %1, %2, %3"
> >>>> 	  : "+r"(retval.s.l), "+r"(retval.s.h)
> >>>> 	  : "r"(lo), "r"(hi));
> >>>>
> >>>>   return retval.ll;
> >>>> }
> >>>>
> >>>
> >>> [Pavel Pavlov]
> >>> Later on I found out that I had to use +r constraint, but then, when
> >>> I use that
> >> function for example like that:
> >>> int64_t rsmlalbb64(int64_t i, int x, int y) {
> >>> 	return smlalbb64(i, x, y);
> >>> }
> >>>
> >>> Gcc generates this asm:
> >>> <rsmlalbb64>:
> >>> push	{r4, r5}
> >>> mov	r4, r0
> >>> mov	ip, r1
> >>> smlalbb	r4, ip, r2, r3
> >>> mov	r5, ip
> >>> mov	r0, r4
> >>> mov	r1, ip
> >>> pop	{r4, r5}
> >>> bx	lr
> >>>
> >>> It's bizarre what gcc is doing in that function, I understand if it
> >>> can't optimize and correctly use r0 and r1 directly, but from that
> >>> listing it looks as if gcc got drunk and decided to touch r5 for
> >>> absolutely no reason!
> >>>
> >>> the expected out should have been like that:
> >>> <rsmlalbb64>:
> >>> smlalbb	r0, r1, r2, r3
> >>> bx	lr
> >>>
> >>> I'm using cegcc 4.1.0 and I compile with
> >>> arm-mingw32ce-g++ -O3 -mcpu=arm1136j-s -c ARM_TEST.cpp -o
> >>> arm-mingw32ce-g++ ARM_TEST_GCC.obj
> >>>
> >>> Is there a way to access individual parts of that 64-bit input
> >>> integer or, is there a way to specify that two 32-bit integers
> >>> should be treated as a Hi:Lo parts of 64 bit variable. It's commonly
> >>> done with a temporary, but the result is that gcc generates to much junk.
> >>
> >> Why don't you just use the function I sent above?  It generates
> >>
> >> smlalbb:
> >> 	smlalbb r0, r1, r2, r3
> >> 	mov	pc, lr
> >>
> >> smlalXX64:
> >> 	smlalbb r0, r1, r2, r3
> >> 	smlalbt r0, r1, r2, r3
> >> 	smlaltb r0, r1, r2, r3
> >> 	smlaltt r0, r1, r2, r3
> >> 	mov	pc, lr
> >>
> >
> > [Pavel Pavlov]
> > What's your gcc -v? The output I posted comes from your function.
> 
> 4.3.0
> 
> Perhaps your compiler options were wrong?  Dunno.
> 
> Andrew.