CSights wrote:
Currently using doubles, but thanks for reminding me about the number of
decimals that make sense.
By default calculations on the 387 are done by the hardware in 80 bits
precision, but truncated down to 64 (assuming double types) when moved
out of the registers. There are a number of ways to deal with it, or at
least expose it:
-ffloat-store will cause gcc to always move intermediate results out of
registers and into memory, which effectively gets rid of the excess
precision at the cost of a speed hit.
Progress! Now the program output matching blocks are
(O0 -ffloat-store == O1 ffloat-store == O2 ffloat-store) != (O0) != (O1 == O2
== O3) In other words, now the O0 matches 1,2 with the addition
of -ffloat-store, even though it still doesn't match the Ox without
ffloat-store.
Does this suggest to you the mismatching output was due to decimal point
differences rather than other problems (aliasing for example)?
It suggests that you were in fact getting more than 53-bit double
somewhere, and that it's not an aliasing error.
Also, I didn't mention earlier (did I?) that the program's output when
compiled on the Macintosh matched at all optimization levels. (O0 == O1 ==
O2) (Though the output did not match any output from the program compiled on
linux.) Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use
those instructions which have more meaningful decimal places?
If you use SSE2, you have no extra precision for -ffloat-store to
suppress. Assuming the machine where you used 387 has SSE2 hardware,
you could set -mfpmath=sse That is the default for 64-bit gcc.
I've tried using floats only in the what I guess is the key calculation
involving the exp(), then casting to double (so that I don't have to modify
all the code to be float), but this doesn't result in matching output between
O1 and O0. Does the compiler do any recasting of float->double double->float
behind the scenes?
The 387 exp() performs all its calculations with extra precision. Then,
if you don't set -ffloat-store, it may never get rounded down. If you
have no SSE2 math library, you will still get 387 exp() even if you set
-mfpmath=sse, but there will be an implicit -ffloat-store in the
conversion of the result to SSE2.