In my experience, SSE is generally more useful when you can optimize your structures as SOA (struct of array) vs AOS (array of struct). If you expect a speed up by doing individual groups of pairs of doubles, I doubt you'll see much improvement except in extreme situations, or when the compiler might detect a pattern in your code. Also, shuffles etc... are killers. Much better would be if you had 10000 of these things to take distances at once, and you could lay out the data friendlier for SSE (SOA). Brian On Mon, Apr 7, 2008 at 9:08 AM, Dario Bahena Tapia <dario.mx@xxxxxxxxx> wrote: > Hello, > > I tried with your options but it seems to make no difference. In > another email it was suggested to use _mm_sqrt_sd, because I only > needed one sqrt calculation. That improved time and indeed, almost > reach serial version (now it runs up to 1 second slower for the 10,000 > data example, hehe). > > But of course, I would wanna/expect the vector version to run faster > ... still unsure how to achieve that. > > Thanks > > > > On Mon, Apr 7, 2008 at 10:23 AM, jlh <jlh@xxxxxx> wrote: > > Dario Bahena Tapia wrote: > > > > > > > > inline static double dist(int i,int j) > > > { > > > double xd = C[i][X] - C[j][X]; > > > double yd = C[i][Y] - C[j][Y]; > > > return rint(sqrt(xd*xd + yd*yd)); > > > } > > > [...] > > > > > > And in order to activate the SSE2 features, I am using the following > > > flags for gcc (my computer is a laptop): > > > > > > CFLAGS = -O -Wall -march=pentium-m -msse2 > > > > > > > These options do not make dist() use any SSE for me. Have you > > tried compiling with this? > > > > CFLAGS = -O2 -Wall -march=pentium-m -mfpmath=sse > > > > I think -msse2 is redundant if you say -march-pentium-m. I don't > > have an SSE2 machine to try this though. > > > > jlh > > >