moving data between x87 and xmm registers

"Gautam Sewani" <gautamcool88@xxxxxxxxx> · Thu, 5 Jun 2008 12:09:33 +0530

Hi,
I am trying to vectorize a piece of code using SSE 2 intrinsics (the
one's in emmintrin.h).I am using double precision floating point
arithmetic.The running times I obtained were very similar with and
without the vectorization. I suspect the reason for this is that in
the vectorized code, I am storing the contents of a packed xmm
register (represented by an __m128d variable) into a double array.

Looking into the assembly code generated, I saw that for this, the
contents of the xmm register were first saved to a memory location and
then loaded into the x87 FPU stack. Apparently there is no direct way
to transfer data between x87 and xmm registers. One way to eliminate
this would be to use xmm registers for all floating point
calculations. But inspite of using -march=prescott and -mfpmath=sse,
x87 instructions like fld and fstp are still used. Is there any to
force GCC to use only the xmm registers for all floating point
calculations?(I tried using the -mno-80387 option but I am getting
lots of weird linker errors with that). Or is there anyway to move
data between x87 and xmm registers without using memory as an
intermediary ?

Regards
Gautam