PowerPC function prologue/epilogue not optimised away as expected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there:

I'm trying to port an embedded application from an old proprietary compiler to GCC. However, the generated code was bigger, and I found that, when building for the PowerPC EABI, GCC does not optimize the function prologue/epilogue away for simple wrapper functions like this example:

void SMC2Write ( const char * pBuf,    UWORD Len );

void UartWrite ( const char * pBuffer, UINT32 Length)
{  
  // This call just casts the 32-bit Length argument down to 16 bits.
  SMC2Write( pBuffer, Length );
}

Code generated by GCC (with my own comments added):
001372a8 <UartWrite>:
  ; move from link register to r0
1372a8:	7c 08 02 a6 	mflr    r0
  ; push r1 (stack pointer), create stack frame
1372ac:	94 21 ff f8 	stwu    r1,-8(r1)

  ; Clear high-order 16 bits of r4, (cut the 'Len' argument
  ; to SMC2Write from 32 to 16 bits)
1372b0:	54 84 04 3e 	clrlwi  r4,r4,16

  ; push r0
1372b4:	90 01 00 0c 	stw     r0,12(r1)

1372b8:	48 00 66 c5 	bl      13d97c <SMC2Write>

  ; pop r0
1372bc:	80 01 00 0c 	lwz     r0,12(r1)
  ; destroy the stack frame
1372c0:	38 21 00 08 	addi    r1,r1,8

  ; restore the link register from r0
1372c4:	7c 08 03 a6 	mtlr    r0

1372c8:	4e 80 00 20 	blr


Equivalent code generated by an old commercial compiler (with my own comments added). It's basically the same, only with the function prologue and epilogue optimised away:

000f8e3c <UartWrite>:
  ; clrlwi rA,rS,16 (equivalent to rlwinm rA,rS,0,16,31)
  ; Clear the high-order 16 bits of rS and place the result into rA.
f8e3c:	54 84 04 3e 	clrlwi  r4,r4,16 

f8e40:	48 00 62 a0 	b       ff0e0 <SMC2Write>

I'm using GCC 4.5.3, I had no luck building GCC 4.6.0 . These are the specs:

Target: powerpc-unknown-eabi
Configured with: [blah blah...]/gcc-4.5.3/configure --enable-languages=c,c++ --config-cache --prefix=[blah...]/cross-toolchain-mpc8xx-eabi --target=powerpc-unknown-eabi --with-cpu=860 --without-fp --with-float=soft --disable-nls --disable-shared --disable-libssp --disable-multilib
--disable-decimal-float --disable-fixed-point --with-newlib --with-gnu-ld --with-gnu-as --enable-lto
Thread model: single

These are the flags used to build the embedded application:
[blah...]/powerpc-unknown-eabi-g++ -MMD -MT 'my_file.o' -Os -fdata-sections -ffunction-sections -g3 -Wall -Wpointer-arith -Wwrite-strings -Wunused-parameter -Wextra -DHAVE_SNPRINTF  -Wformat -Wformat-security -mno-relocatable -mcpu=860 -fno-exceptions -DUSES_NEWLIB -DSTDINT_H_ALWAYS_DEFINES_UINTXX_MAX -mcpu=860 -meabi -mno-relocatable -mno-relocatable-lib -DNDEBUG=1 -D__BIG_ENDIAN__=1 -DTARGET_IS_BIG_ENDIAN  -c -o my_file.o "my_file.cpp"

Using -O3 instead of -Os does not change matters.

The embedded target has no operating system, so there are no DLLs or shared objects or any dynamic relocation at all in the generated code.

I did try to build the embedded application with Link Time Optimization, but GCC prints a terse error message about LTO not being supported on the target platform.

I found this message from Ian Lance Taylor about not being able to optimise tail calls:

    http://gcc.gnu.org/ml/gcc-help/2010-05/msg00257.html
    "It's a known limitation due to the way the PowerPC EABI works. Tailcalls are only possible to static functions."

However, that message offers no explanation and I'm not sure whether the underlying reason also applies here.

Please copy me on the answer, as I'm not subscribed to this list.

Many thanks in advance,
  R. Diez




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux