Floating point performance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm running the program below twice with different command line arguments. The
argument is used a a floating point scaling factor in the code, but does not
change the algorithm in any way.  I am baffled by the difference in run time of
the two runs, since the program flow is not altered by the argument.

$ gcc -O3 t.c

$ time ./a.out 0.1

real	0m7.300s
user	0m7.286s
sys	0m0.007s

$ time ./a.out 0.0001

real	0m0.060s
user	0m0.058s
sys	0m0.003s


The second run is about 120 times faster then the first.

I did some quick tests using the 'perf' profiling utility on Linux, and
it seems that the slow run has about 70% branch misses, which I guess
might kill performance drastically.

I am able to reproduce this on multiple i686 boxes using various gcc versions
(4.4, 4.6). Compiling on x86_64 does not show this behaviour.

Is anybody able to reproduce this issue, and how can this be explained ?

Thanks,

Ico



/* 
 * gcc -O3 test.c && ./a.out NUMBER
 */

#include <stdio.h>
#include <stdlib.h>

#define N 4000
#define S 5000

struct t {
        double a, b, f;
};

int main(int argc, char **argv)
{
        int i, j;
        struct t t[N];
        double f = atof(argv[1]);

        for(i=0; i<N; i++) {
                t[i].a = 0;
                t[i].b = 1;
                t[i].f = i * f;
        };

        for(j=0; j<S; j++) {
                for(i=0; i<N; i++) {
                        t[i].a += t[i].b * t[i].f;
                        t[i].b -= t[i].a * t[i].f;
                }
        }

        return t[1].a;
}





processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz
stepping	: 11


Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.6/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-7'
  --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
  --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
  --program-suffix=-4.6 --enable-shared --enable-linker-build-id
  --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
  --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
  --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
  --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
  --enable-targets=all --with-arch-32=i586 --with-tune=generic
  --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
  --target=i486-linux-gnu
Thread model: posix
gcc version 4.6.2 (Debian 4.6.2-7) 
-- 
:wq
^X^Cy^K^X^C^C^C^C


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux