floating point extended/double precision gcc vs intel compilers

"Joe Hughes" <jhughesjr@xxxxxxxxxxx> · Wed, 28 Jul 2004 15:20:03 -0400

Is there a compiler switch for gcc that sets the FPU to double precision 
like the -pc64 switch for the Intel compiler does? I can not find one.

If there is not a compiler switch to modify the behaviour of gcc to 
utilize double precision registers when performing intermediate 
computations on floating point numbers, does the FPU_SETCW macro only 
have to be set once per process or is it once per each individual thread 
in a multithreaded application or is it for each processor on a 
multiprocessor machine?

Below is code comparing/contrasting intel/gcc compilers and their 
behaviour regarding floating point numbers.

The Following Code:

#include <stdio.h>

int main(int argc, char *argv[])
{
    int iTmp = 0;
    double dTmp =-79.937384;

    printf("original %.6f\n",dTmp);
    iTmp = (int)(dTmp * 1000000.0);
    printf("int cast: %d\n",iTmp);
}

Using the gcc compiler and intel compiler on x86 architecture creates
the following output:

original -79.937384
int cast: -79937383

As discussed previously (thanks to Brian Gough for pointing me in the
proper direction) this is because they generate code that by default
uses extended precision registers for performing arithmetic computations
on floating point numbers.

If I modify the above application to store the computation (dTmp *
1000000.0) in a temporary variable dTmp1 of double precision and then
perform the int cast on the temporary variable like this:
#include <stdio.h>

int main(int argc, char *argv[])
{
    int iTmp = 0;
    double dTmp =-79.937384;
    double dTmp1 = 0.0;

    printf("original %.6f\n",dTmp);
    dTmp1 = dTmp * 1000000.0;
    iTmp = (int)(dTmp1);
    printf("int cast: %d\n",iTmp);
}

I receive the following output (gcc and intel compilers):

original -79.937384
int cast: -79937384

Furthermore, if I modify the code to set the FPU control word to utilize
double precision instead of using the default extended precision like this:

#include <stdio.h>
#include <fpu_control.h>

int main(int argc, char *argv[])
{
    int iTmp = 0;
    double dTmp =-79.937384;

    fpu_control_t cw;
    _FPU_GETCW(cw);
    cw &= ~_FPU_EXTENDED;
    cw |= _FPU_DOUBLE;
    _FPU_SETCW(cw);

    printf("original %.6f\n",dTmp);
    iTmp = (int)(dTmp * 1000000.0);
    printf("int cast: %d\n",iTmp);
}

I receive this output (gcc and intel compilers):
original -79.937384
int cast: -79937384

The Intel Compiler provides a switch -pc64 that sets the FPU to use
double precision so that the original code when compiled with -pc64
compiler switch returns:

original -79.937384
int cast: -79937384

Below are some timing results of a debug build of the above code using 
gcc and intel:

Time latlngtest intel compiler 1 billion iterations using âpc64 switch:
original -79.937384
int cast: -79937384
11.460u 0.000s 0:11.46 100.0%   0+0k 0+0io 209pf+0w

Time latlngtest intel compiler 1 billion iterations without using âpc64:
original -79.937384
int cast: -79937383
11.480u 0.000s 0:11.47 100.0%   0+0k 0+0io 209pf+0w

Time latlngtest 1 billion iterations using g++
original -79.937384
int cast: -79937383
36.500u 0.000s 0:36.49 100.0%   0+0k 0+0io 146pf+0w

I didn't dig into why their were such discrepencies in time but they 
were both debug so optimizations should not have been enabled by default.

Cheers,
Joe