Is there a compiler switch for gcc that sets the FPU to double precision like the -pc64 switch for the Intel compiler does? I can not find one. If there is not a compiler switch to modify the behaviour of gcc to utilize double precision registers when performing intermediate computations on floating point numbers, does the FPU_SETCW macro only have to be set once per process or is it once per each individual thread in a multithreaded application or is it for each processor on a multiprocessor machine? Below is code comparing/contrasting intel/gcc compilers and their behaviour regarding floating point numbers. The Following Code: #include <stdio.h> int main(int argc, char *argv[]) { int iTmp = 0; double dTmp =-79.937384; printf("original %.6f\n",dTmp); iTmp = (int)(dTmp * 1000000.0); printf("int cast: %d\n",iTmp); } Using the gcc compiler and intel compiler on x86 architecture creates the following output: original -79.937384 int cast: -79937383 As discussed previously (thanks to Brian Gough for pointing me in the proper direction) this is because they generate code that by default uses extended precision registers for performing arithmetic computations on floating point numbers. If I modify the above application to store the computation (dTmp * 1000000.0) in a temporary variable dTmp1 of double precision and then perform the int cast on the temporary variable like this: #include <stdio.h> int main(int argc, char *argv[]) { int iTmp = 0; double dTmp =-79.937384; double dTmp1 = 0.0; printf("original %.6f\n",dTmp); dTmp1 = dTmp * 1000000.0; iTmp = (int)(dTmp1); printf("int cast: %d\n",iTmp); } I receive the following output (gcc and intel compilers): original -79.937384 int cast: -79937384 Furthermore, if I modify the code to set the FPU control word to utilize double precision instead of using the default extended precision like this: #include <stdio.h> #include <fpu_control.h> int main(int argc, char *argv[]) { int iTmp = 0; double dTmp =-79.937384; fpu_control_t cw; _FPU_GETCW(cw); cw &= ~_FPU_EXTENDED; cw |= _FPU_DOUBLE; _FPU_SETCW(cw); printf("original %.6f\n",dTmp); iTmp = (int)(dTmp * 1000000.0); printf("int cast: %d\n",iTmp); } I receive this output (gcc and intel compilers): original -79.937384 int cast: -79937384 The Intel Compiler provides a switch -pc64 that sets the FPU to use double precision so that the original code when compiled with -pc64 compiler switch returns: original -79.937384 int cast: -79937384 Below are some timing results of a debug build of the above code using gcc and intel: Time latlngtest intel compiler 1 billion iterations using âpc64 switch: original -79.937384 int cast: -79937384 11.460u 0.000s 0:11.46 100.0% 0+0k 0+0io 209pf+0w Time latlngtest intel compiler 1 billion iterations without using âpc64: original -79.937384 int cast: -79937383 11.480u 0.000s 0:11.47 100.0% 0+0k 0+0io 209pf+0w Time latlngtest 1 billion iterations using g++ original -79.937384 int cast: -79937383 36.500u 0.000s 0:36.49 100.0% 0+0k 0+0io 146pf+0w I didn't dig into why their were such discrepencies in time but they were both debug so optimizations should not have been enabled by default. Cheers, Joe