On Thu, 2005-02-24 at 12:20 +0100, Brian Budge wrote: > Hi Richard - > > With this kind of example you should definitely get about a 4 times > speed up. One of your issues may be that gcc doesn't seem (I haven't > confirmed this with anyone) to like to perform instruction scheduling > on vector types. I have also seen similar slowdowns when using > xmmintrin.h code if I code things naively. > > My advice: Try to write the code out long hand using the xmm > intrinsics, interleaving loads and arithmetic, and see if you get a > speed up. > > Can anyone confirm if gcc does sub-optimal instruction scheduling for > vector types? Every compiler does sub-optimal scheduling for everything (Optimal scheduling is NP-hard) :) I think you meant "worse than it could be". In that case, yes, but it depends on the platform. Some platforms with supported vector instructions have scheduler descriptions that include the vector instructions. Some do not. For example, the 7450 and G5 scheduling descriptions describe the vector units and schedule vector code. --Dan