"Hot" function optimization recommendations

kugel@xxxxxxxxxxx (Thomas Martitz) · Fri, 05 Apr 2013 09:41:54 +0200

Am 05.04.2013 01:08, schrieb Justin Chudgar:
> I had experimentally thrown an optimization into my module's only significantly
> warm functions. Since I am a novice, this was a just-for-kicks experiment, but
> I would like to know whether to optimize at all beyond the general "-O2", and
> what platforms are critical to consider since I only use pulse on systems that
> are sufficient to run at "-O0" without noticeable problems beyond unnecessary
> power consumption.
>
>  From another thread:
>
>> I'm not sure what to think about the __attribute__((optimize(3))) usage.
>> Have you done some benchmarking that shows that the speedup is
>> significant compared to the normal -O2? If yes, I guess we can keep
>> them. <tanuk>
> I don't know what to think of them either. I did a really simplist benchmark
> with the algorithm on my core i3 laptop initially to determine if it was
> useful to keep everything double or float. There was no benefit to reducing
> presicion on this one system, but that attribute was dramatic. Did not try O2,
> though, just 03 and O0. I thought about messing with vectorization, but I only
> have x86-64 PCs and that seems most valuable for embedded devices which I
> cannot test at the moment.
>

*Any* optimization level is a *massive* speedup against -O0. -O0 means 
that all optimizations are disabled, including even basic stuff like 
putting variables into CPU registers. The generated asm by -O0 is 
probably 2-3x as large as on other levels.

Comparing -O0 to anything is not useful, use it only for debugging. For 
useful comparisons compare -O{,s,1,2,3}.

Best regards.