It appears that you have a conroe based core 2 duo instead of a penryn based one. Pre-penryn class cpu's don't have integer multiplication instruction. Penryn and later intel cpu's do. For older cpu's multiplication has to be implemented by moving stuff into scalar registers, multiplying and moving them back. Which is why it is slow. If you do fp multiply. I am sure, it will come out faster. (You many not need it though, don't know about your app.) (use cpu-z to find out if which cpu-generation yours belongs to) BTW, don't declare such giant vectors(2kb wide is too much). It is possible gcc conked off at it and generated scalar code with some overhead. use 4 wide vectors and then loop over them. GCC will then (pretty sure about this) generate vector math instructions. And yes at O3, gcc does auto vectorization as well. The examples you posted fall into that class(which gcc does without bothering you). My hunch is, loop version is vectorized, but your gcc vector version isn't. Gcc vector extensions are nice. But they don't suit my taste (even when paired with union). There are more options (read intrinsics) (compared to what gcc offers), available when you use them. They often help. I have been helped by them(extra intrinsics). > * Why are vectors so much slower than plain old loops? Shouldn't > * they be faster? Do I have to actually call the built-in MMX and > * SSE instructions myself? Shouldn't the compiler be able to do this > * given this much information? No, you don't. look up intel's sse reference guide. C functions (called intrinsics) for using those instructions are given there. You don't have to mess with asm if you don't want to. These functions mostly map to 1 cpu instruction per call. I felt the same way before as you do now. Went ahead and wrote a inlined wrapper library so that I don't have to bother with arcane intrinsic names. Intel's compilers are very good when it comes to automagic vectorization. Try them if you don't like my solution ( I amusing it in production code, btw). Your example will be sure as hell autovectorized by it. HTH -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of Technology Bombay