Re: Performance and rounding (?) problem ?

Andrew Haley <aph@xxxxxxxxxx> · Thu, 18 Sep 2008 11:20:26 +0100

Christophe Meessen wrote:

> I'm implementing a program do solve inverse problem by iteration but
> with a huge matrix (5GB).
> 
> I'm using g++ 4.2.3 on Ubuntu (Hardy) with an Intel E8400 processor and
> compared results with icc 10.1.018.
> 
> I have noticed two weird things
> 
> 1° processing time:
>   with g++ iteration time slowly increases and with icc it doesn't.
> Though I use exactly same program, same data and same computer. I did
> many tests and this is consistent through all tests.
> 
> For instance it starts at 22s per iteration for g++ and icc then g++
> code slows down to 24s after 10 iterations and then every two or three
> iteration gains one second.
> 
> Note that this code is dominated by memory access since I do huge matrix
> multiplication (float).
> 
> 
> 2° possible rounding problem :
> 
> at the end of each iteration I update the result matrix with the
> following instruction
> 
> for( int i = 0; i < n; ++i )
>   res[i] += err[i] / k[i];
> 
> but with this code I loose convergence after a few iterations.
> When changine this code to this
> 
> for( int i = 0; i < n; ++i )
> {
>   err[i] /= k[i];
>   res[i] += err[i];
> }
> 
> or this
> 
> for( int i = 0; i < n; ++i )
>   err[i] /= k[i];
> 
> for( int i = 0; i < n; ++i )
>   res[i] += err[i];
> 
> The program converges nicely without problem.
> 
> I saw the same problem with this type of result update code too
> 
> for( int i = 0; i < n; ++i )
>   res[i] *= err[i] / k[i];
> 
> I checked with -O2 and -O3 and same effect.
> 
> I tested on another processor (Xeon quadcore E5345) with g++ and icc,
> same behavior.
> 
> I want to understand what is going on here because it is a vicious
> "feature". I spent weeks trying to figure out why the program wouldn't
> converge until I split the instruction to check intermediate results.
> 
> Could this be due to a rounding "feature" ?

It's possible that

  res[i] += err[i] / k[i];

might give different results on x86 from

  err[i] /= k[i];
  res[i] += err[i];

due to intermediate 80-bit precision being used for the result.

However, if you have a 5 Gb matrix I'm assuming you are on a 64-bit system,
and 64-bit platforms don't use the 80-bit intermediate precision.

What does "uname -a" say?

Andrew.