Andrew Haley a écrit :
It's possible that
res[i] += err[i] / k[i];
might give different results on x86 from
err[i] /= k[i];
res[i] += err[i];
due to intermediate 80-bit precision being used for the result.
How could I test this precision issue ?
I noticed that I didn't mention that k is constant.
One significant difference is that in one form err is modified and in
the other not. I checked the code and err is not used beyond that
instruction. In the next iteration err is cleared (memset) before
filling it with data.
I see a similar problem if the operation is
res[i] *= err[i]/k[i];
I also precomputed 1/k[i] and tried
res[i] *= err[i]*k[i];
Same problem and exactly same intermediate results as with the initial form.
I also tested this and it fails too
const float tmp = err[i] / k[i];
res[i] += tmp;
I guess the compiler optimizes the tmp away.
However, if you have a 5 Gb matrix I'm assuming you are on a 64-bit system,
and 64-bit platforms don't use the 80-bit intermediate precision.
Do you mean the hardware or the software ? How is this possible ?
I'm using the 64bit Ubuntu distribution and 64bit g++ and icc.
What does "uname -a" say?
Linux XXX 2.6.24-19-generic #1 SMP Wed Aug 20 17:53:40 UTC 2008 x86_64
GNU/Linux