Brian Budge wrote: > Hmmmm, yeah... I guess it shouldn't make an 8.5 time difference. > > You're right, it should be a smaller difference. The squaring is 2 > muls and an add vs 4 muls and 2 adds, so it should be less than twice > as fast when you include the other portion. > > The instruction ordering might be more optimal also. > > 8.5 times though... weird. > Looks like the problem can be solved by manually inlining the definition of "norm"... //manually inlining "norm" results in a 5x-7x speedup on g++ for(int i=0; i<iter and (Z.real()*Z.real() + Z.imag()*Z.imag()) <= limit_sqr; ++i) Z = Z*Z + C; ...For some reason g++ must not have been able to inline it (or does so after common subexpression elimination or somesuch). Greg Buchholz