Hi all, We are observing a slowdown in the performance of scimark benchmark w.r.t to the gcc versions 3.3.6 vs 4.2.2. And in our actual application, we are observing a slowdown of almost 20%.. We are currently using -O2 for compilation for both the gcc versions. On our product, even with -O3 on gcc-4.2.2, we are not able to reach the performance of gcc-3.3.6 We are also observing that the size of binary has increased from 38M to 50MB. Compiling with -Os didn't yield much benefit (resultant binary was 47 MB). We were guessing that the increase in size could probably because of inlining and hence should yield a better performance. But the results are disappointing. We have done the profiling of the application code using gpror and we observe that almost 80% of the functions of gcc-4.2.2 are running slower than the ones on gcc-3.3.6 Did anyone face this sort of issue earlier (slowness 3.x vs 4.x) ? Could someone suggest the potential list of flags of 4.2.2, which can probably yield a better result or at least at par with the 3.3.6, so that we can experiment a bit and arrive at the best combination. ( we cannot use the -march flag and -ffast-math, due to strict requirement IEEE floating-point conformance ) I have searched through the gcc bugs list, and can find one bug for scimark, but it was not conclusive.(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431) The following are the relavent details of gcc versions and scimark results. Machine ======= Linux opteron26 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux gcc-3.3.6 ======== $gcc -v Reading specs from /linux/depot/gcc-3.3.6-RHEL_4/bin/../lib/gcc-lib/x86_64-redhat-linux/3.3.6/specs Configured with: ../../src/gcc-3.3.6/configure --prefix=/depot/gcc-3.3.6-RHEL_4 --disable-shared --disable-checking --with-system-zlib --enable-threads=posix --enable-__cxa_atexit --enable-languages=c,c++,f77,objc --host=x86_64-redhat-linux Thread model: posix gcc version 3.3.6 ------------------------------------------------------------------------------------ ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@xxxxxxxx) ** ** ** Using 2.00 seconds min time per kenel. Composite Score : 749.67 FFT Mflops: 742.23 (N=1024) SOR Mflops: 648.81 (100 x 100) MonteCarlo: Mflops: 317.68 Sparse matmult Mflops: 942.96 (N=1000, nz=5000) LU Mflops: 1096.65 (M=100, N=100) ====================================================== gcc-4.2.2 ======= gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../../src/gcc-4.2.2/configure --prefix=/depot/gcc-4.2.2-static --disable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,fortran --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.2.2 ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@xxxxxxxx) ** ** ** Using 2.00 seconds min time per kenel. Composite Score : 715.93 FFT Mflops : 676.06 (N=1024) SOR Mflops : 659.92 (100 x 100) MonteCarlo: Mflops : 349.75 Sparse matmult Mflops : 801.66 (N=1000, nz=5000) LU Mflops : 1092.27 (M=100, N=100) ===================================================== Thanks in advance. Regards, Gowri Kumar www.gowrikumar.com