Hi, On 10-2-10 下午10:57, Brian Budge wrote: > Hi - > > To me it is not at all surprising. These hairy strides and mods > certainly aren't going to help. You're doing very little math vs > load/store which means that you're not going to get much out of the This is what my code needs to do. I cannot change it. I see GCC can auto-vectorize the code like: for (i=0; i<256; i++){ a[i] = b[i] + c[i]; } It has even less math, but vectorization should achieve better performance in the code since GCC does it. > vector units. Really you need more of a struct-of-arrays type layout > (pack your doubles together so you can load them in a less strided > fashion, and pack your ints together. This may have the extra benefit > of unobfuscating the code :) I don't understand. What do you mean by less strided fashion? Do you mean all elements in the array should be of the v2df type and then I access each element in the loop by i++? Why will this make difference? Best regards, Zheng Da