Correct way to make a 16-byte aligned double* for SSE vectorization?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am trying to figure out how to make a double* that is 16-byte aligned in the way that SSE instructions want. Hopefully this would allow GCC to auto-vectorize loops in a better way. The problem that I am having is that I want a pointer to an aligned double, not an aligned pointer to a double.

    I am compiling with these options:
% gcc -c test.C -O3 -ftree-vectorizer-verbose=3 -ffast-math

According to the output of the vectorizer, none of the three ways (below) of declaring an aligned pointer actually work. They are treated as unaligned accesses, so presumably the location of the pointer itself is being aligned, but it does not point to an aligned location. In contrast, if I define an aligned double, and then define a pointer to it, this works. Is this recommended?

I ask, because gcc-4.5 complains about declaring a 16-byte aligned double, if the double is an instantiation of a template parameter. (See PR42555.)

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42555

Thanks for any help!

-BenRI

P.S. Here is the example code. As is, the pointers are recognized as aligned. However, if you comment out the definition of SSE_PTR and replace it with any of the tree other approaches, they do not work.

typedef double real;

// these two lines work (together)
typedef real aligned_real __attribute__((aligned(16)));
typedef const aligned_real* SSE_PTR;

// note of these three approaches work to define an aligned pointer in a single line.
//typedef const real *SSE_PTR __attribute__((aligned(16)));
//typedef const real __attribute__((aligned(16))) *SSE_PTR;
//typedef const __attribute__((aligned(16))) real *SSE_PTR;

real f(SSE_PTR __restrict__ p, SSE_PTR __restrict__ q,int n)
{
  real sum = 0;
  for(int i=0; i<n;i++)
    sum += p[i] * q[i];

  return sum;
}

-BenRI

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux