gcc 4.3 generates less efficient code than gcc 4.1 or 4.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On one of my programs (that has many branches in the internal loop),
I've found that gcc 4.3.1 generates less efficient code than gcc 4.1.2.
Now, I'm not sure I select the right optimization options.

For instance, here are various timings I got on various x86_64 machines.
Is there something else I should test? Could this be regarded as a bug
in gcc 4.3 (though the code is correct, the timing is unexpected)?

In the tables below, pgen=0 means without profile generation, and
pgen=8 means a first compilation with -fprofile-generate, a test on
a subset, a second compilation with -fprofile-use, and the timing
on the obtained binary.

AMD Opteron, 2.3 GHz
--------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       373.7           374.8           329.7
gcc 3.4.6       330.9 / 280.4   323.9 / 278.8   281.0 / 318.0 (!!!)
gcc 4.1.2       283.9 / 214.6   237.4 / 197.1   237.2 / 197.3
gcc 4.3.1       327.4 / 238.5   232.6 / 210.4   236.3 / 210.2

Core2 Q6600, 2.40 GHz
---------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       262.3           269.7           267.7
gcc 3.4.6       254.9 / 260.0   266.8 / 263.6   263.4 / 266.7 (!!!)
gcc 4.1.2       240.1 / 248.6   255.7 / 238.5   255.6 / 238.7
gcc 4.3.1       270.5 / 251.8   263.2 / 242.2   263.3 / 242.3

Core2 Q9450, 2.66 GHz
---------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       227.1           233.6           231.9
gcc 3.4.6       220.9 / 224.4   228.6 / 228.9   228.1 / 230.7 (!!!)
gcc 4.1.2       206.8 / 215.6   221.0 / 206.7   221.0 / 206.6
gcc 4.3.1       234.8 / 218.9   228.0 / 210.0   229.0 / 210.0

Pentium D, 3.0 GHz
------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       317.2           345.1           344.5
gcc 3.4.6       315.3 / 329.4   337.7 / 339.6   338.9 / 342.4
gcc 4.1.3       306.3 / 312.5   316.8 / 312.8   316.6 / 313.1
gcc 4.2.2       305.1 / 314.2   305.6 / 308.2   305.1 / 308.2
gcc 4.2.4       305.5 / 314.0   305.2 / 307.6   305.2 / 308.1
gcc 4.3.1       318.3 / 311.1   313.9 / 309.2   315.4 / 309.2

Note: each test has run 3 times and I kept the median value.
The timing accuracy is about 1 second.

Since this is code meant to run for millions of hours, the efficiency
is really important.

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux