Re: binary compiled with -O1 and w/ individual optimization flags are not the same

Brian Dessent <brian@xxxxxxxxxxx> · Sat, 01 Mar 2008 10:21:46 -0800

CSights wrote:

>         Also, I didn't mention earlier (did I?) that the program's output when
> compiled on the Macintosh matched at all optimization levels.  (O0 == O1 ==
> O2) (Though the output did not match any output from the program compiled on
> linux.)  Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use
> those instructions which have more meaningful decimal places?

Yes, it's probably using the sse2 unit.

>         If this is the problem, what would be a good way of dealing with it?

Well first realize that it's not a problem per se.  The results *are*
equivalent in the significant digits that actually represent what a
double can hold.  The only reason they seem different is because there
are these extra bits of precision that result from the value still being
in a 387 register.  But those bits shouldn't matter because as soon as
the result is moved into memory they are truncated away.

> Throwing away the meaningless decimal digits is okay with me, but avoiding
> the performance hit that comes with ffloat-store would be nice.  Also, it

Like I said, you can use -mpc64 to explicitly set the 387 to 64 bits
precision, just like the sse2 unit.  If you don't have a gcc new enough
to have this option or you don't want to depend on requiring an option,
you can simply manually configure the 387 it at the beginning of your
program to disable the extended precision.  See
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323#c60> for a code snippet
of how to do this.  (That relies on a glibc-specific fpu_control.h
header but the definitions in that header are pretty self-contained.)

> would be nice to not have the output depend on compiler flags.

But the output doesn't *really* depend on compiler flags!  That's the
point I'm trying to make.  It only seems like the output differs because
you're looking at something that's like the equivalent of uninitialized
memory.

Suppose you had a string buffer of 80 chars and you filled it with a
\0-terminated string of 40 chars, but to display it you print all 80
chars of the buffer.  Clearly two strings that have the same first 40
chars before the \0 are semantically equivalent as C strings, because
the rest of the buffer is just junk.  No reasonable programmer would
ever consider printing the junk past the \0 when displaying the string,
just like it's not reasonable to print more than 15 (or whatever the
limit is, I forget) significant digits of a double.

This can also cause issues if you are simply testing for equality, i.e.
assert((x/y) == (x/y)) can sometimes fail simply because one result is
in a register and another in memory.  But the solution here is to not
use == for comparing floating point values, but rather compare the
absolute value of their difference to some small delta.  But this is
something that you should do anyway with floating point calculations
because they are by their very design inexact.  Some details at
<http://www.lahey.com/float.htm>.

Brian