Re: Vector parameter loads to SSE registers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/14/2010 5:33 AM, windigo84 wrote:
I am using C language and gcc compiler to program some vectorizable loops
with a Nehalem processor. The function parameters are the type of const
float* and float*. The problem I have is that the gcc performs the loads
from these parameters to the vector registers with these two instructions:

movlps	(%rdx), %xmm0
movhps	8(%rdx), %xmm0


instead of doing with:

movups (%rdx), %xmm0              (if unaligned access)
movaps (%rdx), %xmm0              (if aligned access)


I am compiling with the next flags:

-O2  -fexpensive-optimizations -ftree-vectorize -fargument-noalias-global
-msse3  -ftree-vectorizer-verbose=2


I would like to know several things:

1.- How can avoid the load to be performed with two move instructions
instead of one
2.- Once gcc performs the loads with only one move instruction, how can I
force to use only aligned move instructions
3.- Finally (out of scope of this message), if I use O3 optimization flag
the gcc is not able to vectorize my loops because they are fully unrolled
before the vectorization optimization. The loops iterates from 0 to 8. Is
there any way to avoid this?


You're asking gcc to optimize for Core 2 and other CPUs of that era, where unaligned moves were more expensive than split ones. If your gcc is so old that it doesn't support -mtune=barcelona, you will need to upgrade if you want gcc to use movups. That option is suitable for recent Intel CPUs as well as AMD. You would require _attribute__(aligned) and similar gcc extensions to inform the compiler about alignments so as to enable more use of aligned instructions. Without those, it's not practical to auto-vectorize short loops, except by explicit low-level coding (e.g. sse intrinsics). In any case, a loop of trip 9 might not be efficiently vectorizable.

--

Tim Prince



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux