Re: binary compiled with -O1 and w/ individual optimization flags are not the same

Tim Prince <tprince@xxxxxxxxxxxxx> · Sat, 01 Mar 2008 09:25:28 -0800

CSights wrote:

	Currently using doubles, but thanks for reminding me about the number of 
decimals that make sense.

By default calculations on the 387 are done by the hardware in 80 bits
precision, but truncated down to 64 (assuming double types) when moved
out of the registers.  There are a number of ways to deal with it, or at
least expose it:

-ffloat-store will cause gcc to always move intermediate results out of
registers and into memory, which effectively gets rid of the excess
precision at the cost of a speed hit.

	Progress! Now the program output matching blocks are
(O0 -ffloat-store == O1 ffloat-store == O2 ffloat-store) != (O0) != (O1 == O2 
== O3)  In other words, now the O0 matches 1,2 with the addition 
of -ffloat-store, even though it still doesn't match the Ox without 
ffloat-store.
	Does this suggest to you the mismatching output was due to decimal point 
differences rather than other problems (aliasing for example)?

It suggests that you were in fact getting more than 53-bit double 
somewhere, and that it's not an aliasing error.
	Also, I didn't mention earlier (did I?) that the program's output when 
compiled on the Macintosh matched at all optimization levels.  (O0 == O1 == 
O2) (Though the output did not match any output from the program compiled on 
linux.)  Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use 
those instructions which have more meaningful decimal places?

If you use SSE2, you have no extra precision for -ffloat-store to 
suppress.  Assuming the machine where you used 387 has SSE2 hardware, 
you could set -mfpmath=sse That is the default for 64-bit gcc.
	I've tried using floats only in the what I guess is the key calculation 
involving the exp(), then casting to double (so that I don't have to modify 
all the code to be float), but this doesn't result in matching output between 
O1 and O0.  Does the compiler do any recasting of float->double double->float 
behind the scenes?

The 387 exp() performs all its calculations with extra precision.  Then, 
if you don't set -ffloat-store, it may never get rounded down.  If you 
have no SSE2 math library, you will still get 387 exp() even if you set 
-mfpmath=sse, but there will be an implicit -ffloat-store in the 
conversion of the result to SSE2.