Vector parameter loads to SSE registers

windigo84 <aberna@xxxxxxxxxx> · Tue, 14 Dec 2010 02:33:13 -0800 (PST)

I am using C language and gcc compiler to program some vectorizable loops
with a Nehalem processor. The function parameters are the type of const
float* and float*. The problem I have is that the gcc performs the loads
from these parameters to the vector registers with these two instructions:

movlps	(%rdx), %xmm0
movhps	8(%rdx), %xmm0

instead of doing with:

movups (%rdx), %xmm0              (if unaligned access)
movaps (%rdx), %xmm0              (if aligned access)

I am compiling with the next flags:

-O2  -fexpensive-optimizations -ftree-vectorize -fargument-noalias-global
-msse3  -ftree-vectorizer-verbose=2

I would like to know several things:

1.- How can avoid the load to be performed with two move instructions
instead of one
2.- Once gcc performs the loads with only one move instruction, how can I
force to use only aligned move instructions
3.- Finally (out of scope of this message), if I use O3 optimization flag
the gcc is not able to vectorize my loops because they are fully unrolled
before the vectorization optimization. The loops iterates from 0 to 8. Is
there any way to avoid this?

Thanks in advance,

Jandro

-- 
View this message in context: http://old.nabble.com/Vector-parameter-loads-to-SSE-registers-tp30453479p30453479.html
Sent from the gcc - Help mailing list archive at Nabble.com.