2012/2/17 James Cloos <cloos@xxxxxxxxxxx>: >>>>>> "MB" == Miles Bader <miles@xxxxxxx> writes: > >>> But in my experience, -mfpmath=sse will slow my code very much. > > MB> Hmm, I've always found SSE FP to be a speedup -- sometimes a _big_ > MB> speedup -- over 387 FP, at least when one is using mostly primitive > MB> FP operations (mul, divide, sqrt, etc) ... I think it's worth > MB> testing, at least. > > Many years ago, when I asked about using -fpmath=sse on an ia32 box, the > advice was that, because the function args and return values had to be > passed on the 387 stack, most code would be much slower. I suppose it depends on the actual content of the functions whether that would be a significant factor. In general, I'd think there shouldn't be a whole lot of function-calling going on in the inner loop unless the function in question actually do something non-trivial (I think this is especially true for a lot of FP-intensive coding styles, where somewhat more attention is paid to throughput, and a bit less to things like abstraction), and the more a function does, the less impact the function call itself has. So a speed increase in primitive operations should make up for some extra per-call overhead. > Some of the new chips seem to have specific optimizations to deal with > code which constantly moves values between registers and the stack, so > it is probably less of an issue on newer chips than it used to be. My earlier observation is based on benchmarks mostly on P3-era CPUs (the last time I used the traditional x86 abi much). I dunno how representative that is... > But if one is using a newer chip, why not upgrade to -m64, too? Totally :] -miles -- Cat is power. Cat is peace.