I am using C language and gcc compiler to program some vectorizable loops with a Nehalem processor. The function parameters are the type of const float* and float*. The problem I have is that the gcc performs the loads from these parameters to the vector registers with these two instructions: movlps (%rdx), %xmm0 movhps 8(%rdx), %xmm0 instead of doing with: movups (%rdx), %xmm0 (if unaligned access) movaps (%rdx), %xmm0 (if aligned access) I am compiling with the next flags: -O2 -fexpensive-optimizations -ftree-vectorize -fargument-noalias-global -msse3 -ftree-vectorizer-verbose=2 I would like to know several things: 1.- How can avoid the load to be performed with two move instructions instead of one 2.- Once gcc performs the loads with only one move instruction, how can I force to use only aligned move instructions 3.- Finally (out of scope of this message), if I use O3 optimization flag the gcc is not able to vectorize my loops because they are fully unrolled before the vectorization optimization. The loops iterates from 0 to 8. Is there any way to avoid this? Thanks in advance, Jandro -- View this message in context: http://old.nabble.com/Vector-parameter-loads-to-SSE-registers-tp30453479p30453479.html Sent from the gcc - Help mailing list archive at Nabble.com.