Recently, I used OProfile to track down a performance problem that I
think is the same as you are seeing. But the ratio between the two was
MUCH higher in the case I was testing.
If your issue turns out to be something different, I suggest oprofile as
the best tool to identify it.
If it is the same issue, it is not the code generated by the compiler
that matters, it is something strange that happens inside the gnu
version of exp() in libm that doesn't happen in the Intel version of
exp() in libimf.
To work around this problem, I link in Intel's libimf ahead of libm even
when using the gcc compiler.
I haven't had time to dig through the source code of gnu exp() to figure
out what is really going on. But both oprofile and gdb indicated that
exp() sometimes calls out to a VERY slow multi precision routine. That
can take a thousand times longer for one exp() call than the Intel
version. The overall performance ratio is then determined by what
fraction of your exp() calls cause the gnu exp() code to decide to use
the super slow version.
If any experts are reading this thread and have a better understanding
of the issue, I'd like the answer. I didn't investigate myself much
more than explained above.
jackfrost wrote:
//very simple array function calculation:
#include "math.h"
#include "time.h"
#include "stdio.h"
static double A[50000000];
int main(int argc, char *argv)
{
for (int t=0;t<50000000;t++)
A[t]=5.55*sin(t); //random data
time_t time0 = clock();
for (int t=0;t<50000000;t++)
A[t]=exp(A[t]);
printf("%g\n", ((double)(clock()-time0))/CLOCKS_PER_SEC);
}
Time for this code compiled with Intel10 compiler is 1.2sec.
Result for code compiled with GCC(v3 and v4) is 7.2sec.
I've tried all optimization options: -mfpmath=sse -msse2 -O3
-mtune=pentium-m
But still intel is 7 times faster.