Fwd: autovectorization of outer loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have the following C++ code that evaluates a Chebyshev polynomial
using Clenshaw's algorithm

void cheby_eval(double *coeffs,int n,double *xs,double *ys,int m)
{
  #pragma omp simd
  for (int i=0;i<m;i++){
    double x = xs[i];
    double u0=0,u1=0,u2=0;
    for (int k=n;k>=0;k--){
      u2 = u1;
      u1 = u0;
      u0 = 2*x*u1-u2+coeffs[k];
    }
    ys[i] = 0.5*(coeffs[0]+u0-u2);
  }
}

I'm hoping for an autovectorization of the outer loop so that the
inner loop operates on vectors.

When compiled with

g++ -march=haswell -O3 -fopt-info-vec-missed -S chebyshev.cc

using g++ 6.3.0, no vectorization happens I get the messages

chebyshev.cc:11:17: note: not vectorized: control flow in loop.
chebyshev.cc:11:17: note: bad loop form.
chebyshev.cc:14:19: note: intermediate value used outside loop.
chebyshev.cc:14:19: note: Unknown def-use cycle pattern.
chebyshev.cc:14:19: note: reduction used in loop.
chebyshev.cc:14:19: note: Unknown def-use cycle pattern.
chebyshev.cc:14:19: note: Unsupported pattern.
chebyshev.cc:14:19: note: Unsupported pattern.
chebyshev.cc:14:19: note: not vectorized: unsupported use in stmt.
chebyshev.cc:14:19: note: unexpected pattern.
chebyshev.cc:11:17: note: not vectorized: not enough data-refs in basic block.
chebyshev.cc:21:1: note: not vectorized: not enough data-refs in basic block.
chebyshev.cc:14:19: note: not vectorized: not enough data-refs in basic block.
chebyshev.cc:14:19: note: not vectorized: not enough data-refs in basic block.
chebyshev.cc:14:19: note: not vectorized: not enough data-refs in basic block.
chebyshev.cc:11:17: note: not consecutive access _27 = *coeffs_20(D);
chebyshev.cc:11:17: note: not vectorized: no grouped stores in basic block.

On the same code icc vectorizes the outer loop as expected.

I was wondering if there are small ways in which I can change my code
to help gcc's autovectorizer to succeed. I would also appreciate any
pointers to documentation or gcc source that can help me better
understand how gcc's autovectorization of outer loops works.

Regards,
Jyotirmoy Bhattacharya

PS. The interesting part of icc's assembler output is

..B1.4:                         # Preds ..B1.8 ..B1.3
        xorl      %r15d, %r15d                                  #14.5
        xorl      %ebx, %ebx                                    #14.21
        testq     %rsi, %rsi                                    #14.21
        vmovupd   (%rdx,%r9,8), %ymm3                           #12.16
        vxorpd    %ymm5, %ymm5, %ymm5                           #13.14
        vmovdqa   %ymm1, %ymm4                                  #13.19
        vmovdqa   %ymm1, %ymm2                                  #13.24
        jl        ..B1.8        # Prob 2%                       #14.21

..B1.5:                         # Preds ..B1.4
        vaddpd    %ymm3, %ymm3, %ymm3                           #17.14

..B1.6:                         # Preds ..B1.6 ..B1.5
        vmovapd   %ymm4, %ymm2                                  #20.3
        incq      %r15                                          #14.5
        vmovapd   %ymm5, %ymm4                                  #20.3
        vfmsub213pd %ymm2, %ymm3, %ymm5                         #17.19
        vbroadcastsd (%r11,%rbx,8), %ymm6                       #17.22
        decq      %rbx
        vaddpd    %ymm5, %ymm6, %ymm5                           #17.22
        cmpq      %r10, %r15                                    #14.5
        jb        ..B1.6        # Prob 82%                      #14.5

..B1.8:                         # Preds ..B1.6 ..B1.4
        vbroadcastsd (%rdi), %ymm3                              #19.18
        vaddpd    %ymm3, %ymm5, %ymm4                           #19.28
        vsubpd    %ymm2, %ymm4, %ymm2                           #19.31
        vmulpd    %ymm2, %ymm0, %ymm5                           #19.31
        vmovupd   %ymm5, (%rcx,%r9,8)                           #19.5
        addq      $4, %r9                                       #11.3
        cmpq      %r8, %r9                                      #11.3
        jb        ..B1.4        # Prob 82%                      #11.3



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux