Andrew Haley wrote:
Well, that's interesting. Somehow, the Intel compiler is four times
faster than gcc even when gcc is generating straight inlined floating-point
instructions. Something very odd is going on here. I'm wondering if we're
looking at the right thing. For example, we've been assuimg that the run-
time is dominated by the time of exp().
I'm sure the measured run time (the measurement leaves out the use of
sin() ) is dominated by the time of exp().
I'm pretty sure the measured run time is dominated by the f2xm1
instruction inside exp().
The Intel version of exp() does not use f2xm1. It uses a much more
complicated method involving far more instructions. But none of those
instructions are anywhere near as slow as f2xm1.
Judging by the results, it would seem both jackfrost and I are testing
on hardware where Intel's choice (to use a large combination of faster
instructions instead of f2xm1) is correct.