Hello, everyone
I have recently started to used the CilkPlus capabilities of gcc5, but
cannot really grasp the vectorization part. Either I am doing sth.
wrong, or there is a bug in gcc. I have inspected generated asm of the
following two implementations of vector addition (a = a + b). The code
is compiled with 'gcc -O3 -mavx -ftree-vectorize -fopt-info-vec
-fcilkplus test.c'.
// ICC compatibility - alignment hint
#ifdef __GNUC__
#define __assume_aligned(lvalueptr, align) lvalueptr =
__builtin_assume_aligned (lvalueptr, align)
#endif
#define RESTRICT __restrict__
---------------
usual C implementation
---------------
void test(Double * RESTRICT a, Double * RESTRICT b, int size)
{
int i;
__assume_aligned(a, 64);
__assume_aligned(b, 64);
for(i=0; i<size; i++)
a[i] = a[i] + b[i];
}
---------------
CilkPlus array notation
---------------
void test_cilkplus1(Double * RESTRICT a, Double * RESTRICT b, int size)
{
__assume_aligned(a, 64);
__assume_aligned(b, 64);
a[0:size] = a[0:size] + b[0:size];
}
The first code (test) is vectorized as expected - here comes the ASM:
.L4:
vmovapd (%rdi,%r8), %ymm0
addl $1, %r9d
vaddpd (%rsi,%r8), %ymm0, %ymm0
vmovapd %ymm0, (%rdi,%r8)
addq $32, %r8
cmpl %r9d, %ecx
ja .L4
On the contrary, the second function (test_cilkplus1) is not vectorized:
.L21:
vmovsd (%rdi,%rax), %xmm0
movl %ecx, %r8d
addl $1, %ecx
vaddsd (%rsi,%rax), %xmm0, %xmm0
vmovsd %xmm0, (%rdi,%rax)
addq $8, %rax
cmpl %r8d, %edx
jg .L21
Now I have made sure that the compiler understands that there is no
aliasing (restrict) and that the vectors are aligned in memory. Clearly
this is enough for the standard implementation, but not for the CilkPlus
array notation.
Is this a bug, or am I missing something?
Thanks a lot!
Marcin