Consider this simple piece of code which takes the product of an array of complex numbers. #include <complex.h> complex float f(complex float x[]) { complex float p = 1.0; for (int i = 0; i < 32; i++) p *= x[i]; return p; } If I compile it with -O3 -march=bdver2 -ffast-math I get f: vmovss xmm2, DWORD PTR .LC1[rip] vxorps xmm1, xmm1, xmm1 lea rax, [rdi+256] .L2: vmovss xmm0, DWORD PTR [rdi+4] add rdi, 8 vmulss xmm3, xmm0, xmm2 vmulss xmm0, xmm0, xmm1 vfmadd132ss xmm1, xmm3, DWORD PTR [rdi-8] vfmsub132ss xmm2, xmm0, DWORD PTR [rdi-8] cmp rax, rdi jne .L2 vmovss DWORD PTR [rsp-8], xmm2 vmovss DWORD PTR [rsp-4], xmm1 vmovq xmm0, QWORD PTR [rsp-8] ret .LC1: .long 1065353216 That is unvectorised assembly. This is with gcc version 7 (snapshot) but earlier versions give similar results. However if I do the same thing with float instead of complex float I get: f(float*): vmovups xmm2, XMMWORD PTR [rdi] vmulps xmm0, xmm2, XMMWORD PTR [rdi+16] vmulps xmm0, xmm0, XMMWORD PTR [rdi+32] vmulps xmm0, xmm0, XMMWORD PTR [rdi+48] vmulps xmm0, xmm0, XMMWORD PTR [rdi+64] vmulps xmm0, xmm0, XMMWORD PTR [rdi+80] vmulps xmm0, xmm0, XMMWORD PTR [rdi+96] vmulps xmm0, xmm0, XMMWORD PTR [rdi+112] vpsrldq xmm1, xmm0, 8 vmulps xmm0, xmm0, xmm1 vpsrldq xmm1, xmm0, 4 vmulps xmm0, xmm0, xmm1 ret This is now vectorised code. Is there any way to persuade gcc to vectorise the complex version? My ultimate goal is to get efficient AVX code for this function. As a test I also tried icc (the Intel Compiler) which does appear to give vectorised code so it is at least possible in principle. Raphael