On 27/05/14 10:40, Andrew Haley wrote:
On 05/25/2014 01:32 PM, Niklas Gürtler wrote:
Hello GCC List,
i am currently working on a hardware API in C++11 for ARM Cortex-M3
microcontrollers. It provides an object oriented way of accessing
hardware registers. The idea is that the user need not worry about
individual registers and their composition of bit fields but can access
these with symbolic names.
The API uses temporary objects and call chaining for syntactic sugar.
The problem is now that GCC produces correct, but way too slow and too
much code.
See the attached simplified testcase (with a dummy linker script to
shorten disassembler output) and the function getInput. When compiling
with gcc-arm-embedded ( https://launchpad.net/gcc-arm-embedded ), this
is the code generated by GCC:
[snip]
But really, I think you are going down the wrong path. If you want GCC
to generate tight code, you should write tight code. Don't write lots
of pointless stuff in the hope that GCC will notice it's pointless.
Maybe it will, maybe not. Your API is rather complicated for what it
does. You should be able to write it in a way that is less work.
Andrew.
I agree with Andrew here only in part. You shouldn't worry about hitting
compiler optimisations as you write code, but you shouldn't convolute
code as a test either BTW. When you want to improve performance I urge
you to actually profile code, maybe learn how to do hypothesis tests,
you want to be able to say "it slows down here" and "after doing X it
now runs Y times better" and things like that.
It is odd that GCC misses this. One of the lower passes ought to have
spotted it right? I would have thought the RTL level stuff would have
cleared this up. Is this an old version thing? (Why is it being missed?)
I don't expect GCC to unravel convoluted code, but I would expect the
RTL passes to spot wasted (stack)space that can never ever be read. I'd
like to know more about how it was missed, because this shouldn't be a
target platform thing, this surely should be done at RTL?
Alec