Re: In GCC 10.2, -O2 optimization enables more than docs suggest

David Brown <david.brown@xxxxxxxxxxxx> · Thu, 21 Jan 2021 10:02:08 +0100

On 21/01/2021 04:53, mark_at_yahoo wrote:
On 1/20/21 1:17 PM, David Brown wrote:

I /never/ use -O0, precisely because I find it absolutely terrible for
assembly level debugging.  You can't see the wood for the trees, as all
local variables are on the stack, and even the simplest of C expressions
ends up with large numbers of assembly instructions.  In my experience -
and this is obviously very subjective - using -O1 gives far more
readable assembly code while avoiding the kinds of code re-arrangement
and re-ordering of -O2 that makes assembly-level debugging difficult.
(-Og is an alternative for modern gcc versions, which can give most of
the speed of -O2 but is a little easier for debugging).

Interesting. My recollection is that -O0, regardless of variables being 
on the stack, was more "linear": Each C statement was followed 
more-or-less by the assembly code required to implement it, then the 
next C statement and so on.

Yes, that is true.  But I find (and again this is my own experience, 
which may not match others - it may also depend on the way you write 
your code) that the amount of stack manipulation code makes it 
impossible to see what is really happening.  For each "arithmetic" 
assembly line doing an add, you've got two or more loads and stores to 
the stack.  And when the "real" assembly is a load or a store, you can't 
see it amongst all the other stack loads and stores.

It is even worse if you use C++.  If you've got a few template classes 
and overloaded operators, -O0 might lead you down a path of a dozen 
nested function calls, each with their stack frames, when with -O1 you 
have just the one assembly instruction that is actually relevant.

With -O1, the assembly for an "add one function :

	int inc(int x) { return x + 1; }

is:

inc(int):
        adds    r0, r0, #1
       	bx      lr

That's simple, easy to understand, easy to follow, easy to step through 
at the assembly level.

With -O0, you get:

inc(int):
        push    {r7}
        sub     sp, sp, #12
        add     r7, sp, #0
        str     r0, [r7, #4]
        ldr     r3, [r7, #4]
        adds    r3, r3, #1
        mov     r0, r3
        adds    r7, r7, #12
        mov     sp, r7
        ldr     r7, [sp], #4
        bx      lr

I can't answer for anyone else, but I know which version /I/ would 
rather try to follow.

In particular, variables in -O1 can get 
tucked away into registers and "disappear" for long stretches of 
assembly before popping up again, and the spaghetti-code jumping for 
common code block elimination. Which is of course all good optimization, 
but makes things hard to follow.

With -O1, you get very little "spaghetti-jumping" - generated code 
follows linear paths that match the source to a large extent.  -O2 is a 
different matter - that's when a lot more of the code re-ordering and 
re-arranging comes in.  Yes, -O1 puts data in registers - IMHO that is a 
/good/ thing for debugging because it makes the code simpler and 
clearer.  And yes, it makes some code and variables "disappear".  That 
can be a good thing (clearing away unnecessary detail), or a bad thing.

Sometimes I'll temporarily add a "volatile" qualifier to a variable, or 
a "no_inline" attribute to a function in order to make debugging easier. 
 That's part of the process.  I actually do almost all of my 
compilation with -O2 as the starting point (and various fine-tuning 
flags) - and I do my debugging with that build.  I strongly dislike the 
idea of different debug/release builds, and do not make a 
differentiation.  If I need a lower optimisation to help trace an 
awkward problem, I'll add a "#pragma optimize 1" line to the relevant code.

But I'll have to revisit the issue again.

Another major benefit of -O1 is that it enables much more code analysis,
which in turn enables much better static checking - I am a big fan of
warning flags and having the compiler tell me of likely problems before
I get as far as testing and debugging.

Me, too ("big fan").

Well, remember that "-O0" greatly limits these warnings.

IMHO, gcc should introduced a new warning that is enabled by default on 
-O0, giving the message "You are using the world's most powerful 
compiler, but you are crippling it with your choice of flags."  There 
should also be a warning if "-Wall" is not enabled.

(OK, perhaps that would be going a little too far...)

(Your project here looks very interesting - I'm going to have a good
look at it when I get the chance.  I won't be able to use it directly,
as a pure GPL license basically makes it unusable for anything but
learning or hobby use, but as it matches ideas I have had myself I am
interested in how it works.)

Thanks. Yes, it's basically a simple idea, and I found out recently that 
others have attempted something similar (which I wish I'd known when I 
started doing it myself). This is now very off-topic for this list, but 
I'd like to get your input, including the GPL vs LPGPL issue (ironic 
given that this is a GNU mailing list). Maybe open an issue at the 
Github repository and we can discuss it there?

I'll look through the project first, and then get back to you - either 
through Github or your email address.  In the meantime, you could look 
at the licencing for FreeRTOS to see if its "GPL with exception" licence 
suits you.  (gcc also has a kind of "GPL with exception" licence, 
otherwise it could not be used for anything other than GPL'ed code.)

mvh.,

David