Dear experts,
I hope I'm posting this to the right mailing list as STL's vector is
partly involved. If not, any pointers to a more appropriate place would
be appreciated.
I'm a PhD student at the Institute for Numerical Simulation at Bonn
University. In our group we employ a self-written C++ library for our
numerical computations. This library also includes a set of "typical"
programs used for benchmarking compilers or assessing effects of new
compiler flags.
When gcc-4.6 was released we noticed a massive performance breakdown in
one of these benchmark problems. However we did not further investigate
and waited for gcc-4.7 instead. Unfortunately the problem persisted.
Digging deeper produced a minimal stand-alone example which I'm
attaching to this mail.
What you see there is actually just 1000 times matrix-vector
multiplication. However the matrix has a highly specific structure which
is encountered when performing numerical computations using the Finite
Element Method (FEM), i.e.:
std::vector<MinimalVec3> rows[9];
Thus it consists of 9 bands of triples of doubles. The length of each
band corresponds to the length of the vector it is applied to.
Compiling with gcc-4.5.0 (our standard compiler) the 'time' command gives:
1m13.246s
Using gcc-4.7.0 we get:
2m6.623s
When removing member variable "double stuff" we get:
1m9.636s
Using a C array instead of std::vector above resolves this issue.
It is probably a demanding question to ask, but anyway: Do you have any
clue what could be causing this problem and what could prevent it from
happening?
We could of course use another matrix class but in comparison to other
matrix implementations (using gcc-4.5) this one here performs best.
We'd be grateful for any advice.
Best regards
Benedict
#include <vector>
class MinimalVec3
{
protected:
double coords[3];
public:
MinimalVec3( ) {
for ( int i = 0; i < 3; ++i )
coords[i] = 0.;
}
inline const double& operator[] ( int I ) const {
return coords[I];
}
};
class MinimalVector
{
protected:
double *_pData;
double stuff; // EVIL
public:
explicit MinimalVector ( int length ) {
_pData = new double[length];
for (int i = 0; i < length; ++i) _pData[i] = 0.;
}
inline double& operator[] ( int I ) {
return _pData[I];
}
inline const double& operator[] ( int I ) const {
return _pData[I];
}
};
int main ( int /*argc*/, char** /*argv*/ ) {
int w = ( 1 << 7 )+1;
int wsqr = w*w;
int wcub = w*w*w;
std::vector<MinimalVec3> rows[9];
for ( int i = 0; i < 9; ++i ) {
rows[i].resize ( wcub );
}
MinimalVector img ( wcub ), res ( wcub );
for ( int c = 0; c < 1000; ++c ) {
// matrix.applyAdd( img, res );
for ( int i = 1; i < w-1; ++i )
for ( int j = 0; j < 3; ++j ) {
// matrix.subApplyAdd ( i*wsqr, ( i + j - 1 ) *wsqr, j, img, res );
for ( int k = 1; k < w - 1; ++k )
for ( int l = 0; l < 3; ++l ) {
// matrix.tripleDiagApplyAdd ( i*wsqr + k*w, ( i + j - 1 ) *wsqr + ( k + l - 1 ) *w, j*3 + l, img, res );
for ( int m = 1; m < w - 1; ++m )
for ( int n = 0; n < 3; ++n )
res[i*wsqr + k*w + m] += img[( i + j - 1 ) *wsqr + ( k + l - 1 ) *w + m + n - 1] * rows[j*3 + l][i*wsqr + k*w + m][n];
}
}
}
return 0;
}