Hello all, I'm stuck on some silly issue and I'm hoping there is a simple solution to it. I have a piece of code that does nothing but performing a very large number of products between std::complex<float> values and some float values in a loop. Using gcc-4.1.2 and gcc-4.2.4 my standard test case runs for about 7:25 minutes and 6:50 minutes on 3.0Ghz Penryn CPUs (single-threaded), however when using gcc-4.3.4 or gcc-4.4.2 or even the svn version, my run-time is > 40 minutes, which is a serious drop in performance. For this test I reduced all compiler options down to -O3 only. Now, I looked a bit at the assembly code produced, and there is two things that are apparent, the gcc-4.3 and newer versions produce assembly code about twice as long as the older gcc versions. Also, gcc-4.1 and 4.2 write out all the multiplications in sse code, while the 4.3 and newer call a routine named __mulsc3. Has anybody ever encountered such a performance drop and knows whether there is a compiler flag or something to get my performance back ? Thank you, Thomas Witzel