Gang Chen <gchen@xxxxxxxxxxxxxx> writes: > In fact by "the multiplying part only", I mean storing the results to a > local temperory double variable "tmp" instead of the largest array. You > can see it in the program. What I saw in your program is that your line 1 measurement did this: for(i = 0; i <N_SHORT ; i++){ tmp = vecS[i]*vecL[j]; and did not use tmp anywhere else. The compiler is smart enough to know that if you assign a value to a local variable, and do not use that local variable anywhere, that it can discard the assignment, and all computations which lead up to the assignment. In this case, the compiler will never do the multiplication at all. In other words, the compiler is smarter than you think. > I also tried without optimization, and the difference is still large: > about 10s vs 30s. Timings when not optimizing aren't all that meaningful. However, in general, you are certainly correct that your program is going to take longer if you store values into memory. You neglected to mention what type of machine you are using, and the tradeoffs are different from each one. I didn't see anything obviously slow in the program. On some processors, such as a PowerPC with Altivec support, the program will run faster if you can use the vector instructions. Note that general details of program optimization are offtopic for the gcc mailing list, unless you are discussing specific compiler optimizations or enhancements. Ian