Re: Invariant is not moved out of loop

Astor Piaz <appiazzolla@xxxxxxxxx> · Tue, 1 Aug 2017 16:25:41 +0200

One last question, if you would allow me to extend a little bit...

Using the code that is at the end of this e-mail, and the following flags:

-Ofast -funroll-loops --param max-completely-peeled-insns=10000
--param max-completely-peel-times=10000

The instructions executed by evaluate(...) are less than
evaluate_fact(...) as shown by the callgrind output:

callgrind.out.13969:fn=(1078) double evaluate_fact<20>(double const*,
double const*)
callgrind.out.13969-0 25334
callgrind.out.13969-
--
callgrind.out.13970:fn=(1078) double evaluate<20>(double const*,
double const*)
callgrind.out.13970-0 17636
callgrind.out.13970-

Why could this be happening? What optimizations could be hindered by
applying the distributive law by hand?

Thanks in advance to all.

Best,
--
Astor

template<int k>
double __attribute__ ((noinline)) evaluate(const double u[], const double phi[])
{
  double f = 0.;
  for (int i3 = 0;i3<k;++i3)
    for (int i2 = 0;i2<k;++i2)
      for (int i1 = 0;i1<k;++i1)
        f += u[i1+k*(i2+k*i3)] * phi[i1] * phi[i2] * phi[i3];
  return f;
}

template<int k>
double __attribute__ ((noinline)) evaluate_fact(const double u[],
const double phi[])
{
  double f3 = 0.;
  for (int i3 = 0;i3<k;++i3)
    {
      const int k3 = k*i3;
      double f2 = 0.;
      for (int i2 = 0;i2<k;++i2)
        {
          const int k2 = k*(i2+k3);
          double f1 = 0.;
          for (int i1 = 0;i1<k;++i1)
            {
              f1 += u[i1+k2] * phi[i1];
            }
          f2 += f1 * phi[i2];
        }
      f3 += f2 * phi[i3];
    }
  return f3;
}

int main()
{
  const static unsigned int k=20;
  const static double u[k*k*k] = {3.14} ;
  const static double phi[k] = {3.14} ;
  double e = 0. ;
#ifndef FACTOR
  e += evaluate<k>(u, phi);
#else
  e += evaluate_fact<k>(u, phi);
#endif
  return static_cast<int>(e);
}

On Thu, Jul 27, 2017 at 7:25 AM, Xi Ruoyao <ryxi@xxxxxxxxxxxxxxxxx> wrote:
> On 2017-07-25 18:04 +0200, Astor Piaz wrote:
>> Hi Alexander,
>>
>> Thank you, your answer is pretty clear and your example is much better.
>>
>> Could I ask you though, why wouldn't this be allowed once the user has
>> requested -fassociative-math ?
>
> Alexander said this is not implemented, but it should be allowed. For
> integer operands, this is allowed by ISO C/C++ standards.  For floating-
> point operands, this should be allowed by -fassociative-math.
>
>> Is there a fundamental problem to provide this optimization or it is
>> just too difficult?
>
> I don't know, but it seems no existing compilers has implemented it yet.
>
>> Thanks a lot.
>>
>> Best,
>> --
>> Astor
>>
>> On Tue, Jul 25, 2017 at 5:11 PM, Alexander Monakov <amonakov@xxxxxxxxx> wrote:
>> > Hi,
>> >
>> > I think what you're asking is not the "usual" loop invariant motion, but rather
>> > applying distributive law to reductions. In your example you want the optimizer
>> > to replace
>> >
>> >   R = 0;
>> >   for ( ... )
>> >     R += X * C;
>> >
>> > by
>> >
>> >   R = 0;
>> >   for ( ... )
>> >     R += X;
>> >   R *= C;
>> >
>> > We don't do that even for integer operands, the following isn't optimized either:
>> > (neither do Clang and ICC according to my experiments on gcc.godbolt.org)
>> >
>> >   int f(int *a)
>> >   {
>> >     int r = 0;
>> >     for (int i = 0; i < 1024; i++)
>> >       r += a[i] * 5;
>> >     return r;
>> >   }
>> >
>> > Alexander
> --
> Xi Ruoyao <ryxi@xxxxxxxxxxxxxxxxx>
> School of Aerospace Science and Technology, Xidian University