Re: Floating point performance issue

Tim Prince <n8tm@xxxxxxx> · Tue, 20 Dec 2011 07:11:42 -0500

On 12/20/2011 4:52 AM, Ico wrote:
Hello,

I'm running the program below twice with different command line arguments. The
argument is used a a floating point scaling factor in the code, but does not
change the algorithm in any way.  I am baffled by the difference in run time of
the two runs, since the program flow is not altered by the argument.
Evidently, your definition of program flow doesn't take into account 
changes in your choice of architecture or number of exceptions.
Is your interest in x87 behavior due to historic considerations?

$ gcc -O3 t.c

$ time ./a.out 0.1

real	0m7.300s
user	0m7.286s
sys	0m0.007s

$ time ./a.out 0.0001

real	0m0.060s
user	0m0.058s
sys	0m0.003s

The second run is about 120 times faster then the first.

I did some quick tests using the 'perf' profiling utility on Linux, and
it seems that the slow run has about 70% branch misses, which I guess
might kill performance drastically.

I am able to reproduce this on multiple i686 boxes using various gcc versions
(4.4, 4.6). Compiling on x86_64 does not show this behaviour.

Is anybody able to reproduce this issue, and how can this be explained ?
If you had turned on your search engine, you would have seen the 
articles about "x87 Floating Point Assist."
Did you also test SSE code with and without abrupt underflow?

Thanks,

Ico

/*
  * gcc -O3 test.c&&  ./a.out NUMBER
  */

#include<stdio.h>
#include<stdlib.h>

#define N 4000
#define S 5000

struct t {
         double a, b, f;
};

int main(int argc, char **argv)
{
         int i, j;
         struct t t[N];
         double f = atof(argv[1]);

         for(i=0; i<N; i++) {
                 t[i].a = 0;
                 t[i].b = 1;
                 t[i].f = i * f;
         };

         for(j=0; j<S; j++) {
                 for(i=0; i<N; i++) {
                         t[i].a += t[i].b * t[i].f;
                         t[i].b -= t[i].a * t[i].f;
                 }
         }

         return t[1].a;
}

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz
stepping	: 11

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.6/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-7'
   --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
   --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
   --program-suffix=-4.6 --enable-shared --enable-linker-build-id
   --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
   --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
   --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
   --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
   --enable-targets=all --with-arch-32=i586 --with-tune=generic
   --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
   --target=i486-linux-gnu
Thread model: posix
gcc version 4.6.2 (Debian 4.6.2-7)

--
Tim Prince