Does -floop-nest-optimize ever work usefully?

Dave Love via Gcc-help <gcc-help@xxxxxxxxxxx> · Mon, 27 Sep 2021 17:01:26 +0100

[I don't know if this merits a bug report, but I expect there's
something worth understanding anyway.]

I'd like to have polyhedral-type optimizations available, but I've never
been able to get -floop-nest-optimize to do anything useful with various
GCC releases.  (I realize it's always been marked experimental.)  Does
it ever actually do anything other than pessimize loop nests, at least
due to stopping vectorization?  If so, what's the trick?

For instance, consider the matmul (dgemm) example from Pluto
<https://raw.githubusercontent.com/bondhugula/pluto/master/examples/matmul/matmul.c>:

  for (i = 0; i < M; i++)
    for (j = 0; j < N; j++)
      for (k = 0; k < K; k++)
        C[i][j] = beta * C[i][j] + alpha * A[i][k] * B[k][j];

If I use -Ofast -floop-nest-optimize, graphite fails:

  matmul.c:73:42: missed: failed: evolution of offset is not affine.

but there's a drastic pessimization (compared with just -Ofast):

  matmul.c:73:42: missed: couldn't vectorize loop

With -O2 -ffast-math -ftree-loop-vectorize -floop-nest-optimize it does
report the nest was optimized, but still vectorization fails, with an
extra message:

  matmul.c:73:42: missed: not vectorized: no vectype for stmt: _17 = A[_62][_61];

The above is on SKX with GCC 11, with or without -march=native.  On
ppc64le, -floop-nest-optimize didn't seem to kick in.

The results with pluto+gcc, or clang with polly, are much better than
gcc -Ofast -- they generate a five-level loop nest, with default tiling.
(I tried with xlc on ppc64le, and couldn't find a way to stop it
pattern-matching to call external dgemm...)

Thanks for any insight.