<fwd from ian taylor's response> "Jack Andrews" <effbiae@xxxxxxxxx> writes:
simd_mmintrin(n, is) I *is; { __v2si q,r; I i; _m_empty(); q=_m_from_int(0); for (i=0; i < n; i+=W) { memcpy(&r,is+i,IZ*W); q=_m_paddd(q,r); } I*qq=(I*)&q; return qq[0]+qq[1]; }
For mmintrin.h functions, use __m64, not __v2si. Why the memcpy? Use _mm_set_pi32(is[i], is[i + 1]). Don't extract the values by taking the address of q. Instead do something like this: union { long ai[2]; __m64 m } u; u.m = q; return u.ai[0] + u.ai[1]; Ian