On 12/20/2011 4:52 AM, Ico wrote:
Hello,
I'm running the program below twice with different command line arguments. The
argument is used a a floating point scaling factor in the code, but does not
change the algorithm in any way. I am baffled by the difference in run time of
the two runs, since the program flow is not altered by the argument.
Evidently, your definition of program flow doesn't take into account
changes in your choice of architecture or number of exceptions.
Is your interest in x87 behavior due to historic considerations?
$ gcc -O3 t.c
$ time ./a.out 0.1
real 0m7.300s
user 0m7.286s
sys 0m0.007s
$ time ./a.out 0.0001
real 0m0.060s
user 0m0.058s
sys 0m0.003s
The second run is about 120 times faster then the first.
I did some quick tests using the 'perf' profiling utility on Linux, and
it seems that the slow run has about 70% branch misses, which I guess
might kill performance drastically.
I am able to reproduce this on multiple i686 boxes using various gcc versions
(4.4, 4.6). Compiling on x86_64 does not show this behaviour.
Is anybody able to reproduce this issue, and how can this be explained ?
If you had turned on your search engine, you would have seen the
articles about "x87 Floating Point Assist."
Did you also test SSE code with and without abrupt underflow?
Thanks,
Ico
/*
* gcc -O3 test.c&& ./a.out NUMBER
*/
#include<stdio.h>
#include<stdlib.h>
#define N 4000
#define S 5000
struct t {
double a, b, f;
};
int main(int argc, char **argv)
{
int i, j;
struct t t[N];
double f = atof(argv[1]);
for(i=0; i<N; i++) {
t[i].a = 0;
t[i].b = 1;
t[i].f = i * f;
};
for(j=0; j<S; j++) {
for(i=0; i<N; i++) {
t[i].a += t[i].b * t[i].f;
t[i].b -= t[i].a * t[i].f;
}
}
return t[1].a;
}
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 11
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.6/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-7'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.6.2 (Debian 4.6.2-7)
--
Tim Prince