AVX - cast from YMM to XMM without overhead?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Using gcc 4.7.0, I am trying to use vector extensions and AVX builtins
to sum 8 complex numbers, where the real and imaginary parts are
stored in separate YMM registers, and store the result in m64. In
other words, I'd like to achieve the following:

  v2sf* z = ...;
  v8sf re = ...;
  v8sf im = ...;
  *z  = { sum(re[0..7]), sum(im[0..7]) };

My first attempt at this was:

  v8sf a = __builtin_ia32_haddps256(re, im);         // iirr iirr
  v4sf b = __builtin_ia32_vextractf128_ps256(a, 1);
  v4sf c = (v4sf)a + b;                              //      iirr
  v4sf d = __builtin_ia32_haddps(c, c);              //      irir
  *z = (v2sf)d;                                      //        ir

However, gcc does not seem to allow casting between vectors of
different lengths (why?!). Hence, my second attempt is as follows:

  v8sf a = __builtin_ia32_haddps256(re, im);         // iirr iirr
  v4sf b = __builtin_ia32_vextractf128_ps256(a, 1);
  v4sf c = *((v4sf*)&a) + b;                         //      iirr
  v4sf d = __builtin_ia32_haddps(c, c);              //      irir
  __builtin_ia32_storelps(z, d);                     //        ir

The problem with this is that in the calculation of "c = a + b", gcc
generates an intermediate instruction (vmovdqa) to store the contents
of the YMM register holding "a" to memory, instead of directly
accessing the low 128 bits via the corresponding XMM register.

I have also looked into using inline assembly to avoid the overhead,
however as far as I can tell it is not possible to use YMM registers
as parameters.

Is there any way to cast from a YMM to an XMM register without
incurring any performance penalty?

Also, who should I ask nicely to make gcc accept code such as my first
attempt above? :-)

-- 
Best regards,

Dag Lem


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux