AW: new ira optimization - adding a loop to ira

<stefan@xxxxxxxxx> · Fri, 13 Sep 2019 14:58:49 +0200

> -----Ursprüngliche Nachricht-----
> Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx>
> Gesendet: Freitag, 13. September 2019 12:58
> An: 'Richard Sandiford' <richard.sandiford@xxxxxxx>
> Cc: gcc-help@xxxxxxxxxxx
> Betreff: AW: new ira optimization - adding a loop to ira
> 
> > -----Ursprüngliche Nachricht-----
> > Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx>
> > Gesendet: Freitag, 13. September 2019 12:45
> > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx>
> > Cc: gcc-help@xxxxxxxxxxx
> > Betreff: AW: new ira optimization - adding a loop to ira
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Richard Sandiford <richard.sandiford@xxxxxxx>
> > > Gesendet: Freitag, 13. September 2019 12:16
> > > An: stefan@xxxxxxxxx
> > > Cc: gcc-help@xxxxxxxxxxx
> > > Betreff: Re: new ira optimization - adding a loop to ira
> > >
> > > <stefan@xxxxxxxxx> writes:
> > > > I'm working on a new optimization to get rid of spilled tmp
> > > > variables
> > (e.g.
> > > > introduced by pre) to use the source mem ref instead of a stack
> slot.
> > > >
> > > > To do this, I added a loop into ira.c:ira()
> > > >
> > > >   init_prune_stack_vars ();
> > > >   do
> > > >     {
> > > > #ifndef IRA_NO_OBSTACK
> > > >   gcc_obstack_init (&ira_obstack); #endif
> > > > bitmap_obstack_initialize (&ira_bitmap_obstack);
> > > >
> > > > ...
> > > >
> > > >       ira_color ();
> > > >
> > > >     }
> > > >   while (flag_prune_stack_vars && prune_stack_vars ());
> > > >
> > > > To get it work, the prune_stack_vars function resets a couple of
> data.
> > > > This is mostly working - but on some source files, it fails due to
> > > > invalid reg_equivs.
> > > > Since this also happens, if the optimizer does nothing and just
> > > > loops
> > once.
> > > >
> > > > Currently I'm calling this, before looping again
> > > >
> > > >       regstat_free_n_sets_and_refs ();
> > > >       regstat_free_ri ();
> > > >       loop_optimizer_finalize ();
> > > >       free_dominance_info (CDI_DOMINATORS);
> > > >
> > > > Any hint, what I'm missing to reset?
> > >
> > > I can't see anything obviously missing.  What kind of failure do you
> > see?  E.g.
> > > do you get an internal compiler error or does the compiler generate
> > > incorrect code?
> > >
> > > Do you see the failure on an in-tree test case?  FWIW, I just tried
> > looping like
> > > this locally and didn't see any failures for the tests I tried.  But
> > > I
> > was obviously
> > > testing without the new optimisation, and so each loop iteration
> > > should
> > just
> > > repeat what the previous one did.
> > >
> > > Not related to the failure, but: do you do anything with the
> > > obstacks
> > when
> > > looping again?  Including the initialisations in the loop as above
> > > would introduce a memory leak if you don't do anything to free the
> > contents.
> > > It'd probably be better to initialise outside the loop unless you're
> > really
> > > confident that the no data is carried across iterations.
> > >
> > > Thanks,
> > > Richard
> >
> > Thanks für the ira_obstack hint - I will take care of this, once the
> loop mode
> > is working - maybe I can start looping later or I'll free the memory.
> >
> > In reload: push_reload(...) this raises an error:
> >
> >       gcc_assert (regno < FIRST_PSEUDO_REGISTER
> > 		  || reg_renumber[regno] >= 0
> > 		  || reg_equiv_constant (regno) == NULL_RTX);
> >
> > I already know that it's reg_equiv_constant and that this
> reg_equiv_constant
> > is also set in the unpatched code.
> >
> > So I am looking why these additional reloads occur. There are
> > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...
> >
> > Thanks,
> > Stefan
> 
> 
> The difference is the additional expr_list, which causes the reload:
> 
> (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46])
>         (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
>                 (const_int 16 [0x10])) [178 this+0 S4 A16]))
> engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2}
>      (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
>                 (const_int 16 [0x10])) [178 this+0 S4 A16])
>         (nil)))
> 
> => I'll add some code to drop the expr_list from all insns...

I took the wrong corner:

A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This sets the req_equiv and causes the failure during reload...

=> I added code to record all insn/REG_EQUAL-note pairs
=> and restore these if the loop is run again - dropping the REQ_EQUIV notes.

And this issue went aways.

Plus I moved the loop start further below, so the ira_obstack is only initialized once:

  init_prune_stack_vars ();
  do
    {
      init_reg_equiv ();

=> I can continue to work on the optimizer itself.

To provide an example:

void transformVector( double* restrict inputVector, double const transformMatrix[4][4],double* restrict outputVector)
{
    for(int k = 0; k < 900; k++)
    {
        double x = *inputVector++;
        double y = *inputVector++;
        double z = *inputVector++;

        for(int l = 0; l < 3; l++){
            double res =  transformMatrix[l][0] * x;
            res +=  transformMatrix[l][1] * y;
            res +=  transformMatrix[l][2] * z;
            res +=  transformMatrix[l][3];
            *outputVector++ = res;
        }
    }
}

m68k-amigaos-gcc -m68080 -O3 x.c -S

yields:

#NO_APP
        .text
        .align  2
        .globl  _transformVector
_transformVector:
        link.w a5,#-88
        move.l (16,a5),a0
        move.l (8,a5),a1
        fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
        movem.l a4/a3/a2,-(sp)
        move.l (12,a5),a2
        move.l (a2)+,(-16,a5)
        move.l (a2)+,(-12,a5)
        lea (21600,a0),a4
        fdmove.d (a2)+,fp7
        move.l (a2)+,(-8,a5)
        move.l (a2)+,(-4,a5)
        move.l (a2)+,(-24,a5)
        move.l (a2)+,(-20,a5)
        move.l (a2)+,(-32,a5)
        move.l (a2)+,(-28,a5)
        move.l (a2)+,(-40,a5)
        move.l (a2)+,(-36,a5)
        move.l (a2)+,(-48,a5)
        move.l (a2)+,(-44,a5)
        move.l (a2)+,(-56,a5)
        move.l (a2)+,(-52,a5)
        move.l (a2)+,(-64,a5)
        move.l (a2)+,(-60,a5)
        move.l (a2)+,(-72,a5)
        move.l (a2)+,(-68,a5)
        move.l (a2)+,(-80,a5)
        move.l (a2)+,(-76,a5)
        move.l (a2),(-88,a5)
        move.l (4,a2),(-84,a5)
.L2:
        fdmove.d (8,a1),fp0
        lea (24,a1),a3
        lea (24,a0),a2
        fdmove.d (a1),fp6
        move.l a3,a1
        fdmove.x fp0,fp4
        fdmove.d (-16,a5),fp2
        fdmul.x fp6,fp2
        fdmul.x fp7,fp4
...

And with the new option:

m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars

_transformVector:
        link.w a5,#0
        move.l (16,a5),a1
        move.l (12,a5),a0
        fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
        movem.l a6/a4/a3/a2,-(sp)
        move.l (8,a5),a2
        lea (21600,a1),a6
.L2:
        fdmove.d (a2),fp2
        lea (24,a2),a4
        lea (24,a1),a3
        fdmove.d (8,a2),fp0
        move.l a4,a2
        fdmove.x fp2,fp3
        fdmove.x fp0,fp5
        fdmul.d (a0),fp3
        fdmul.d (8,a0),fp5
...

Btw: the code is not platform specific -> guess it's generally useful 

Thanks
Stefan