On one of my programs (that has many branches in the internal loop), I've found that gcc 4.3.1 generates less efficient code than gcc 4.1.2. Now, I'm not sure I select the right optimization options. For instance, here are various timings I got on various x86_64 machines. Is there something else I should test? Could this be regarded as a bug in gcc 4.3 (though the code is correct, the timing is unexpected)? In the tables below, pgen=0 means without profile generation, and pgen=8 means a first compilation with -fprofile-generate, a test on a subset, a second compilation with -fprofile-use, and the timing on the obtained binary. AMD Opteron, 2.3 GHz -------------------- CC / CFLAGS -O1 -O2 -O3 pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8 gcc 3.3.6 373.7 374.8 329.7 gcc 3.4.6 330.9 / 280.4 323.9 / 278.8 281.0 / 318.0 (!!!) gcc 4.1.2 283.9 / 214.6 237.4 / 197.1 237.2 / 197.3 gcc 4.3.1 327.4 / 238.5 232.6 / 210.4 236.3 / 210.2 Core2 Q6600, 2.40 GHz --------------------- CC / CFLAGS -O1 -O2 -O3 pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8 gcc 3.3.6 262.3 269.7 267.7 gcc 3.4.6 254.9 / 260.0 266.8 / 263.6 263.4 / 266.7 (!!!) gcc 4.1.2 240.1 / 248.6 255.7 / 238.5 255.6 / 238.7 gcc 4.3.1 270.5 / 251.8 263.2 / 242.2 263.3 / 242.3 Core2 Q9450, 2.66 GHz --------------------- CC / CFLAGS -O1 -O2 -O3 pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8 gcc 3.3.6 227.1 233.6 231.9 gcc 3.4.6 220.9 / 224.4 228.6 / 228.9 228.1 / 230.7 (!!!) gcc 4.1.2 206.8 / 215.6 221.0 / 206.7 221.0 / 206.6 gcc 4.3.1 234.8 / 218.9 228.0 / 210.0 229.0 / 210.0 Pentium D, 3.0 GHz ------------------ CC / CFLAGS -O1 -O2 -O3 pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8 gcc 3.3.6 317.2 345.1 344.5 gcc 3.4.6 315.3 / 329.4 337.7 / 339.6 338.9 / 342.4 gcc 4.1.3 306.3 / 312.5 316.8 / 312.8 316.6 / 313.1 gcc 4.2.2 305.1 / 314.2 305.6 / 308.2 305.1 / 308.2 gcc 4.2.4 305.5 / 314.0 305.2 / 307.6 305.2 / 308.1 gcc 4.3.1 318.3 / 311.1 313.9 / 309.2 315.4 / 309.2 Note: each test has run 3 times and I kept the median value. The timing accuracy is about 1 second. Since this is code meant to run for millions of hours, the efficiency is really important. -- Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.org/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/> Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)