Re: Vector parameter loads to SSE registers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/15/2010 5:24 AM, windigo84 wrote:
Thanks Tim. Now it's using only one move instruction (I am using gcc  4.4.3).
About the alignment problem, I am using this typedef:

typedef float* af __attribute__ ((__aligned__(16)));

But anyway it uses unaligned move instructions. Maybe it is using unaligned
move instructions because for Nehalem architectures (Barcelona tune) the
movups and the movaps instructions spend the same cycles for aligned moves
and thus the compiler is conservative and it uses movups. Anyway thanks,

Jandro
Maybe the actual definition of the aligned data has to be visible before gcc will take it as aligned. As you say, movaps is no faster for loads than movups on current CPUs, so the main question is whether the compiler chooses to generate a remainder loop for alignment. Such a remainder loop will interfere with efficiency of short loop vectorization.

--
Tim Prince



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux