On Fri, Jun 6, 2008 at 4:11 AM, Tim Prince <TimothyPrince@xxxxxxxxxxxxx> wrote: > Gautam Sewani wrote: >> >> That is very bad news indeed :-( . >> Can anyone confirm this with some testing? (I am using a Core duo, and >> don't have access to Core 2 Duo.) >> Regards >> Gautam >> On Thu, Jun 5, 2008 at 7:26 PM, Frédéric Bastien <nouiz@xxxxxxxxx> wrote: >>> >>> Hi, >>> >>> With processor before core2 from intel, their was a bottleneck in the >>> CPU that make all sse instruction being split in two. So as you have >>> only two double in a sse instruction and if you have a processor with >>> such a bottleneck, I see only 1 way to have a speed up. Use float >>> instead of double. I know, this is not always an option. To my >>> knowledge prescott cpu have this bottleneck. > > bad mix of top and bottom posting, some elided > > I don't see how this relates to the beginning of the thread. It's true that > some CPUs in the past (pentium-m, AMD before Barcelona) always split 128-bit > operands into 2 64-bit operands. This doesn't mean you should avoid > parallel SSE2, although it may reinforce the point that you should consider > whether you are going about your task the best way. > Hi, Instead of the code I was originally referring to, I tried a very simple task of adding an array of 2-dimensional vectors.For timing, I used the Boost timer class.I made three versions - without utilizing any sort of SIMD instructions (http://pastebin.com/m3e8838c2), using SSE2 instructions via intel intrinsics (http://pastebin.com/m783f8e7d) and using SSE2 instructions through GCC vector intrinsics (http://pastebin.com/m6f36194e). The best times obtained were without using any SIMD instructions. For compiling I used -march=prescott and -O3. When I tried compiling without the -O3 flag, the code with the gcc vector intrinsics was 1.5 times faster than the one without SIMD instructions, and intel intrinsics code was the slowest. Any help will be greatly appreciated. Regards Gautam