I have the following C++ code that evaluates a Chebyshev polynomial using Clenshaw's algorithm void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m) { #pragma omp simd for (int i=0;i<m;i++){ double x = xs[i]; double u0=0,u1=0,u2=0; for (int k=n;k>=0;k--){ u2 = u1; u1 = u0; u0 = 2*x*u1-u2+coeffs[k]; } ys[i] = 0.5*(coeffs[0]+u0-u2); } } I'm hoping for an autovectorization of the outer loop so that the inner loop operates on vectors. When compiled with g++ -march=haswell -O3 -fopt-info-vec-missed -S chebyshev.cc using g++ 6.3.0, no vectorization happens I get the messages chebyshev.cc:11:17: note: not vectorized: control flow in loop. chebyshev.cc:11:17: note: bad loop form. chebyshev.cc:14:19: note: intermediate value used outside loop. chebyshev.cc:14:19: note: Unknown def-use cycle pattern. chebyshev.cc:14:19: note: reduction used in loop. chebyshev.cc:14:19: note: Unknown def-use cycle pattern. chebyshev.cc:14:19: note: Unsupported pattern. chebyshev.cc:14:19: note: Unsupported pattern. chebyshev.cc:14:19: note: not vectorized: unsupported use in stmt. chebyshev.cc:14:19: note: unexpected pattern. chebyshev.cc:11:17: note: not vectorized: not enough data-refs in basic block. chebyshev.cc:21:1: note: not vectorized: not enough data-refs in basic block. chebyshev.cc:14:19: note: not vectorized: not enough data-refs in basic block. chebyshev.cc:14:19: note: not vectorized: not enough data-refs in basic block. chebyshev.cc:14:19: note: not vectorized: not enough data-refs in basic block. chebyshev.cc:11:17: note: not consecutive access _27 = *coeffs_20(D); chebyshev.cc:11:17: note: not vectorized: no grouped stores in basic block. On the same code icc vectorizes the outer loop as expected. I was wondering if there are small ways in which I can change my code to help gcc's autovectorizer to succeed. I would also appreciate any pointers to documentation or gcc source that can help me better understand how gcc's autovectorization of outer loops works. Regards, Jyotirmoy Bhattacharya PS. The interesting part of icc's assembler output is ..B1.4: # Preds ..B1.8 ..B1.3 xorl %r15d, %r15d #14.5 xorl %ebx, %ebx #14.21 testq %rsi, %rsi #14.21 vmovupd (%rdx,%r9,8), %ymm3 #12.16 vxorpd %ymm5, %ymm5, %ymm5 #13.14 vmovdqa %ymm1, %ymm4 #13.19 vmovdqa %ymm1, %ymm2 #13.24 jl ..B1.8 # Prob 2% #14.21 ..B1.5: # Preds ..B1.4 vaddpd %ymm3, %ymm3, %ymm3 #17.14 ..B1.6: # Preds ..B1.6 ..B1.5 vmovapd %ymm4, %ymm2 #20.3 incq %r15 #14.5 vmovapd %ymm5, %ymm4 #20.3 vfmsub213pd %ymm2, %ymm3, %ymm5 #17.19 vbroadcastsd (%r11,%rbx,8), %ymm6 #17.22 decq %rbx vaddpd %ymm5, %ymm6, %ymm5 #17.22 cmpq %r10, %r15 #14.5 jb ..B1.6 # Prob 82% #14.5 ..B1.8: # Preds ..B1.6 ..B1.4 vbroadcastsd (%rdi), %ymm3 #19.18 vaddpd %ymm3, %ymm5, %ymm4 #19.28 vsubpd %ymm2, %ymm4, %ymm2 #19.31 vmulpd %ymm2, %ymm0, %ymm5 #19.31 vmovupd %ymm5, (%rcx,%r9,8) #19.5 addq $4, %r9 #11.3 cmpq %r8, %r9 #11.3 jb ..B1.4 # Prob 82% #11.3