Re: binary compiled with -O1 and w/ individual optimization flags are not the same

CSights <csights@xxxxxxxxxxx> · Sat, 1 Mar 2008 10:57:25 -0500



Hi Brian and list, 

> What data types are used in this calculation?  floats only offer ~7
> decimals and doubles only ~14 decimals of precision, so there's no way
> you can compare past that and expect consistancy, unless you're using
> some kind of bignum package.

	Currently using doubles, but thanks for reminding me about the number of 
decimals that make sense.

> By default calculations on the 387 are done by the hardware in 80 bits
> precision, but truncated down to 64 (assuming double types) when moved
> out of the registers.  There are a number of ways to deal with it, or at
> least expose it:
>
> -ffloat-store will cause gcc to always move intermediate results out of
> registers and into memory, which effectively gets rid of the excess
> precision at the cost of a speed hit.

	Progress! Now the program output matching blocks are
(O0 -ffloat-store == O1 ffloat-store == O2 ffloat-store) != (O0) != (O1 == O2 
== O3)  In other words, now the O0 matches 1,2 with the addition 
of -ffloat-store, even though it still doesn't match the Ox without 
ffloat-store.
	Does this suggest to you the mismatching output was due to decimal point 
differences rather than other problems (aliasing for example)?
	Also, I didn't mention earlier (did I?) that the program's output when 
compiled on the Macintosh matched at all optimization levels.  (O0 == O1 == 
O2) (Though the output did not match any output from the program compiled on 
linux.)  Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use 
those instructions which have more meaningful decimal places?
	If this is the problem, what would be a good way of dealing with it?  
Throwing away the meaningless decimal digits is okay with me, but avoiding 
the performance hit that comes with ffloat-store would be nice.  Also, it 
would be nice to not have the output depend on compiler flags.
	Is there a way to do the float-store equivalent in the program code itself?  
The goal being to have the program's output when compiled with O0,1,2 match, 
as it does with -ffloat-store.
	I've tried using floats only in the what I guess is the key calculation 
involving the exp(), then casting to double (so that I don't have to modify 
all the code to be float), but this doesn't result in matching output between 
O1 and O0.  Does the compiler do any recasting of float->double double->float 
behind the scenes?
	Another way might be to use doubles, then zero out the least significant bits 
that a float does not have.  Then use these modified doubles in the 
calculation.  ?

Thanks again!
	C.