Vector parameter loads to SSE registers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am using C language and gcc compiler to program some vectorizable loops
with a Nehalem processor. The function parameters are the type of const
float* and float*. The problem I have is that the gcc performs the loads
from these parameters to the vector registers with these two instructions:

movlps	(%rdx), %xmm0
movhps	8(%rdx), %xmm0


instead of doing with:

movups (%rdx), %xmm0              (if unaligned access)
movaps (%rdx), %xmm0              (if aligned access)


I am compiling with the next flags:

-O2  -fexpensive-optimizations -ftree-vectorize -fargument-noalias-global
-msse3  -ftree-vectorizer-verbose=2


I would like to know several things:

1.- How can avoid the load to be performed with two move instructions
instead of one
2.- Once gcc performs the loads with only one move instruction, how can I
force to use only aligned move instructions
3.- Finally (out of scope of this message), if I use O3 optimization flag
the gcc is not able to vectorize my loops because they are fully unrolled
before the vectorization optimization. The loops iterates from 0 to 8. Is
there any way to avoid this?

Thanks in advance,

Jandro


-- 
View this message in context: http://old.nabble.com/Vector-parameter-loads-to-SSE-registers-tp30453479p30453479.html
Sent from the gcc - Help mailing list archive at Nabble.com.



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux