On 05/25/2014 01:32 PM, Niklas Gürtler wrote: > Hello GCC List, > > i am currently working on a hardware API in C++11 for ARM Cortex-M3 > microcontrollers. It provides an object oriented way of accessing > hardware registers. The idea is that the user need not worry about > individual registers and their composition of bit fields but can access > these with symbolic names. > The API uses temporary objects and call chaining for syntactic sugar. > The problem is now that GCC produces correct, but way too slow and too > much code. > > See the attached simplified testcase (with a dummy linker script to > shorten disassembler output) and the function getInput. When compiling > with gcc-arm-embedded ( https://launchpad.net/gcc-arm-embedded ), this > is the code generated by GCC: With GCC 4.8.1 I get something similar. I tried trunk GCC (for AArch64) and I get: 0000000000000000 <getInput()>: 0: d29ffdc0 mov x0, #0xffee // #65518 4: f2a01800 movk x0, #0xc0, lsl #16 8: b9400000 ldr w0, [x0] c: d3524800 ubfx x0, x0, #18, #1 10: d65f03c0 ret I think you'd get something very similar for 32-bit ARM. But really, I think you are going down the wrong path. If you want GCC to generate tight code, you should write tight code. Don't write lots of pointless stuff in the hope that GCC will notice it's pointless. Maybe it will, maybe not. Your API is rather complicated for what it does. You should be able to write it in a way that is less work. Andrew.