David Palao wrote:
__asm__ __volatile__ ("movsd %0, %%xmm3 \n\t" \
"movsd %1, %%xmm6 \n\t" \
"movsd %2, %%xmm4 \n\t" \
"movsd %3, %%xmm7 \n\t" \
"movsd %4, %%xmm5 \n\t" \
"unpcklpd %%xmm3, %%xmm3 \n\t" \
"unpcklpd %%xmm6, %%xmm6 \n\t" \
"unpcklpd %%xmm4, %%xmm4 \n\t" \
"mulpd %%xmm0, %%xmm3 \n\t" \
....
"addpd %%xmm6, %%xmm5 \n\t" \
"addpd %%xmm7, %%xmm3 \n\t" \
"movsd %7, %%xmm6 \n\t" \
"movsd %8, %%xmm7 \n\t" \
"unpcklpd %%xmm6, %%xmm6 \n\t" \
"unpcklpd %%xmm7, %%xmm7 \n\t" \
"mulpd %%xmm1, %%xmm6 \n\t" \
"mulpd %%xmm2, %%xmm7 \n\t" \
"addpd %%xmm6, %%xmm4 \n\t" \
"addpd %%xmm7, %%xmm5" \
don't write it this way, use the mmx builtins directly and then the
compiler can handle all the register allocation for you. You'll
have to be careful to arrange for no more than 8 mmx things
to be live at one time though. That's not too hard to achieve
if you're careful. I had success using this technique to do some
2D FFTs, it was way simpler than writing assembly directly.
nathan
--
Nathan Sidwell :: http://www.codesourcery.com :: CodeSourcery LLC
nathan@xxxxxxxxxxxxxxxx :: http://www.planetfall.pwp.blueyonder.co.uk