Hello, Think I concur, indeed, original program had structure of arrays (each coordinate in separate array). Will try to use SSE2 over that flavor, although I think sqrt will still be the bottleneck ... maybe I could use also another norm function (like maximum or taxicab). Thanks. On Mon, Apr 7, 2008 at 5:51 PM, Brian Budge <brian.budge@xxxxxxxxx> wrote: > In my experience, SSE is generally more useful when you can optimize > your structures as SOA (struct of array) vs AOS (array of struct). If > you expect a speed up by doing individual groups of pairs of doubles, > I doubt you'll see much improvement except in extreme situations, or > when the compiler might detect a pattern in your code. Also, shuffles > etc... are killers. > > Much better would be if you had 10000 of these things to take > distances at once, and you could lay out the data friendlier for SSE > (SOA). > > Brian > > > > On Mon, Apr 7, 2008 at 9:08 AM, Dario Bahena Tapia <dario.mx@xxxxxxxxx> wrote: > > Hello, > > > > I tried with your options but it seems to make no difference. In > > another email it was suggested to use _mm_sqrt_sd, because I only > > needed one sqrt calculation. That improved time and indeed, almost > > reach serial version (now it runs up to 1 second slower for the 10,000 > > data example, hehe). > > > > But of course, I would wanna/expect the vector version to run faster > > ... still unsure how to achieve that. > > > > Thanks > > > > > > > > On Mon, Apr 7, 2008 at 10:23 AM, jlh <jlh@xxxxxx> wrote: > > > Dario Bahena Tapia wrote: > > > > > > > > > > > inline static double dist(int i,int j) > > > > { > > > > double xd = C[i][X] - C[j][X]; > > > > double yd = C[i][Y] - C[j][Y]; > > > > return rint(sqrt(xd*xd + yd*yd)); > > > > } > > > > [...] > > > > > > > > And in order to activate the SSE2 features, I am using the following > > > > flags for gcc (my computer is a laptop): > > > > > > > > CFLAGS = -O -Wall -march=pentium-m -msse2 > > > > > > > > > > These options do not make dist() use any SSE for me. Have you > > > tried compiling with this? > > > > > > CFLAGS = -O2 -Wall -march=pentium-m -mfpmath=sse > > > > > > I think -msse2 is redundant if you say -march-pentium-m. I don't > > > have an SSE2 machine to try this though. > > > > > > jlh > > > > > >