Re: binary compiled with -O1 and w/ individual optimization flags are not the same

Brian Dessent <brian@xxxxxxxxxxx> · Fri, 29 Feb 2008 14:47:02 -0800

CSights wrote:

>         Below is some output which does not match when viewed at high precision.  The
> output should match, because the number is calculated from the string
> sequence. (The output is two individuals' fitness in a genetic algorithm if
> that matters to you.)

What data types are used in this calculation?  floats only offer ~7
decimals and doubles only ~14 decimals of precision, so there's no way
you can compare past that and expect consistancy, unless you're using
some kind of bignum package.

>         This makes me think that there is something going on with calculation of the
> number from the string of letters.  The calculation includes use of exp(),
> but I think that is the only special thing other than + - / *.
>         I played around with "-msse -mfpunit=sse", but this doesn't seem to make a
> difference.  Are there any other math type stuff to try? Does the output of a
> program compiled with "-O2 -fno-strict-aliasing" not being the same give you
> all any clues?  Also, I tried -Wstrict-aliasing and -Wstrict-aliasing=1, but
> they didn't give any warnings/errors.

You might be running into the excess precision "problem" with the 387. 
I'm not sure that this is directly related to aliasing rules in any way,
other than you might have a mixture of both issues.

By default calculations on the 387 are done by the hardware in 80 bits
precision, but truncated down to 64 (assuming double types) when moved
out of the registers.  There are a number of ways to deal with it, or at
least expose it:

-ffloat-store will cause gcc to always move intermediate results out of
registers and into memory, which effectively gets rid of the excess
precision at the cost of a speed hit.

-mpc64 can be used to set the 387 to 64 bit precision instead of 80.

You can tell gcc to not use the 387 unit, and to use the sse unit
instead which does not suffer from this.  That's what -mfpmath=sse does
(note it's not -mfpunit), but you also have to tell gcc that it's okay
to emit sse instructions, thus -march=(something that supports sse) or
-msse is required.  But do note however that plain sse can only do
single precision (32 bit) floats, so if you're using doubles you require
sse2, otherwise the 387 will still be used.  You mentioned you're using
an Athlon-XP which does not support sse2, so that's out.

Brian