Hello Dario, I haven't tried your code yet but I think you could get a good boost if you replace the "sqrt_pd" call with "sqrt_sd", since you only need the square root of a scalar. Dario > > inline static double dist_sse(int i,int j) > { > double d; > __m128d xmm0,xmm1; > xmm0 =_mm_load_pd(C[i]); > xmm1 = _mm_load_pd(C[j]); > xmm0 = _mm_sub_pd(xmm0,xmm1); > xmm1 = xmm0; > xmm0 = _mm_mul_pd(xmm0,xmm1); > xmm1 = _mm_shuffle_pd(xmm0, xmm0, _MM_SHUFFLE2(1, 1)); > xmm0 = _mm_add_pd(xmm0,xmm1); > xmm0 = _mm_sqrt_pd(xmm0); > _mm_store_sd(&d,xmm0); > return rint(d); > } >