Andrew Haley wrote:
Hazael Maldonado Torres wrote:
To my surprise, if I swap the operators of the multiplication from:
ln_2z = log(2.0) * cal_heterozygosity(dist_matrix, table_type);
to:
ln_2z = cal_heterozygosity(dist_matrix, table_type) * log(2.0);
then the problem disappears.
My main concerned is whether the way I have coded cal_heterozygosity()
conflicts with GCC or whether there is a bug in GCC. If it is indeed my
mistake I would like to know what is wrong with the coding as I may be using
the same standards to code other functions in my library. Or, it is possible
to get access to the source code once it has been optimised?.
I only saw the quote, not the original question (because Andrew replied
in a different list than the one in which Hazael asked) so maybe I
missed something, but:
You haven't told us many important things:
Architecture? I assume 32-bit x86, but it helps to tell us.
Data type? (float or double)
How much did the answer change? (ln_2z one way vs the other)
Still, I'll make a good guess at the underlying issue (because I've seen
it so often).
In 32-bit x86 architecture, most floating point operations occur in
80-bit floating point. When stored into memory they are rounded to 32
or 64 bits. But if you enable optimization, some values are never
stored to memory and are used as 80-bit.
I expect the optimizer cannot see what a human would see, that log(2.0)
could be a compile time constant. So the optimized code rounds the 80
bit return of a function to 64 (or 32) bit to store in memory, then gets
the 80-bit return from the other function, reloads the stored value and
multiplies. The result will depend on which one of the two function
returns was rounded from 80 bit to 64 bit. Even if the result is
immediately rounded to 64 bit, it can be a different 64 bit value
depending on which function return was rounded to 64 bit.