AW: new ira optimization - adding a loop to ira

<stefan@xxxxxxxxx> · Fri, 20 Sep 2019 19:07:15 +0200

> -----Ursprüngliche Nachricht-----
> Von: gcc-help-owner@xxxxxxxxxxx <gcc-help-owner@xxxxxxxxxxx> Im
> Auftrag von stefan@xxxxxxxxx
> Gesendet: Freitag, 13. September 2019 14:59
> An: 'Richard Sandiford' <richard.sandiford@xxxxxxx>
> Cc: gcc-help@xxxxxxxxxxx
> Betreff: AW: new ira optimization - adding a loop to ira
> 
> > -----Ursprüngliche Nachricht-----
> > Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx>
> > Gesendet: Freitag, 13. September 2019 12:58
> > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx>
> > Cc: gcc-help@xxxxxxxxxxx
> > Betreff: AW: new ira optimization - adding a loop to ira
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx>
> > > Gesendet: Freitag, 13. September 2019 12:45
> > > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx>
> > > Cc: gcc-help@xxxxxxxxxxx
> > > Betreff: AW: new ira optimization - adding a loop to ira
> > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: Richard Sandiford <richard.sandiford@xxxxxxx>
> > > > Gesendet: Freitag, 13. September 2019 12:16
> > > > An: stefan@xxxxxxxxx
> > > > Cc: gcc-help@xxxxxxxxxxx
> > > > Betreff: Re: new ira optimization - adding a loop to ira
> > > >
> > > > <stefan@xxxxxxxxx> writes:
> > > > > I'm working on a new optimization to get rid of spilled tmp
> > > > > variables
> > > (e.g.
> > > > > introduced by pre) to use the source mem ref instead of a stack
> > slot.
> > > > >
> > > > > To do this, I added a loop into ira.c:ira()
> > > > >
> > > > >   init_prune_stack_vars ();
> > > > >   do
> > > > >     {
> > > > > #ifndef IRA_NO_OBSTACK
> > > > >   gcc_obstack_init (&ira_obstack); #endif
> > > > > bitmap_obstack_initialize (&ira_bitmap_obstack);
> > > > >
> > > > > ...
> > > > >
> > > > >       ira_color ();
> > > > >
> > > > >     }
> > > > >   while (flag_prune_stack_vars && prune_stack_vars ());
> > > > >
> > > > > To get it work, the prune_stack_vars function resets a couple of
> > data.
> > > > > This is mostly working - but on some source files, it fails due
> > > > > to invalid reg_equivs.
> > > > > Since this also happens, if the optimizer does nothing and just
> > > > > loops
> > > once.
> > > > >
> > > > > Currently I'm calling this, before looping again
> > > > >
> > > > >       regstat_free_n_sets_and_refs ();
> > > > >       regstat_free_ri ();
> > > > >       loop_optimizer_finalize ();
> > > > >       free_dominance_info (CDI_DOMINATORS);
> > > > >
> > > > > Any hint, what I'm missing to reset?
> > > >
> > > > I can't see anything obviously missing.  What kind of failure do
> > > > you
> > > see?  E.g.
> > > > do you get an internal compiler error or does the compiler
> > > > generate incorrect code?
> > > >
> > > > Do you see the failure on an in-tree test case?  FWIW, I just
> > > > tried
> > > looping like
> > > > this locally and didn't see any failures for the tests I tried.
> > > > But I
> > > was obviously
> > > > testing without the new optimisation, and so each loop iteration
> > > > should
> > > just
> > > > repeat what the previous one did.
> > > >
> > > > Not related to the failure, but: do you do anything with the
> > > > obstacks
> > > when
> > > > looping again?  Including the initialisations in the loop as above
> > > > would introduce a memory leak if you don't do anything to free the
> > > contents.
> > > > It'd probably be better to initialise outside the loop unless
> > > > you're
> > > really
> > > > confident that the no data is carried across iterations.
> > > >
> > > > Thanks,
> > > > Richard
> > >
> > > Thanks für the ira_obstack hint - I will take care of this, once the
> > loop mode
> > > is working - maybe I can start looping later or I'll free the memory.
> > >
> > > In reload: push_reload(...) this raises an error:
> > >
> > >       gcc_assert (regno < FIRST_PSEUDO_REGISTER
> > > 		  || reg_renumber[regno] >= 0
> > > 		  || reg_equiv_constant (regno) == NULL_RTX);
> > >
> > > I already know that it's reg_equiv_constant and that this
> > reg_equiv_constant
> > > is also set in the unpatched code.
> > >
> > > So I am looking why these additional reloads occur. There are
> > > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...
> > >
> > > Thanks,
> > > Stefan
> >
> >
> > The difference is the additional expr_list, which causes the reload:
> >
> > (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46])
> >         (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> >                 (const_int 16 [0x10])) [178 this+0 S4 A16]))
> > engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2}
> >      (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> >                 (const_int 16 [0x10])) [178 this+0 S4 A16])
> >         (nil)))
> >
> > => I'll add some code to drop the expr_list from all insns...
> 
> I took the wrong corner:
> 
> A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This
> sets the req_equiv and causes the failure during reload...
> 
> => I added code to record all insn/REG_EQUAL-note pairs => and restore
> these if the loop is run again - dropping the REQ_EQUIV notes.
> 
> And this issue went aways.
> 
> Plus I moved the loop start further below, so the ira_obstack is only
> initialized once:
> 
>   init_prune_stack_vars ();
>   do
>     {
>       init_reg_equiv ();
> 
> 
> => I can continue to work on the optimizer itself.
> 
> To provide an example:
> 
> void transformVector( double* restrict inputVector, double const
> transformMatrix[4][4],double* restrict outputVector) {
>     for(int k = 0; k < 900; k++)
>     {
>         double x = *inputVector++;
>         double y = *inputVector++;
>         double z = *inputVector++;
> 
>         for(int l = 0; l < 3; l++){
>             double res =  transformMatrix[l][0] * x;
>             res +=  transformMatrix[l][1] * y;
>             res +=  transformMatrix[l][2] * z;
>             res +=  transformMatrix[l][3];
>             *outputVector++ = res;
>         }
>     }
> }
> 
> m68k-amigaos-gcc -m68080 -O3 x.c -S
> 
> yields:
> 
> #NO_APP
>         .text
>         .align  2
>         .globl  _transformVector
> _transformVector:
>         link.w a5,#-88
>         move.l (16,a5),a0
>         move.l (8,a5),a1
>         fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
>         movem.l a4/a3/a2,-(sp)
>         move.l (12,a5),a2
>         move.l (a2)+,(-16,a5)
>         move.l (a2)+,(-12,a5)
>         lea (21600,a0),a4
>         fdmove.d (a2)+,fp7
>         move.l (a2)+,(-8,a5)
>         move.l (a2)+,(-4,a5)
>         move.l (a2)+,(-24,a5)
>         move.l (a2)+,(-20,a5)
>         move.l (a2)+,(-32,a5)
>         move.l (a2)+,(-28,a5)
>         move.l (a2)+,(-40,a5)
>         move.l (a2)+,(-36,a5)
>         move.l (a2)+,(-48,a5)
>         move.l (a2)+,(-44,a5)
>         move.l (a2)+,(-56,a5)
>         move.l (a2)+,(-52,a5)
>         move.l (a2)+,(-64,a5)
>         move.l (a2)+,(-60,a5)
>         move.l (a2)+,(-72,a5)
>         move.l (a2)+,(-68,a5)
>         move.l (a2)+,(-80,a5)
>         move.l (a2)+,(-76,a5)
>         move.l (a2),(-88,a5)
>         move.l (4,a2),(-84,a5)
> .L2:
>         fdmove.d (8,a1),fp0
>         lea (24,a1),a3
>         lea (24,a0),a2
>         fdmove.d (a1),fp6
>         move.l a3,a1
>         fdmove.x fp0,fp4
>         fdmove.d (-16,a5),fp2
>         fdmul.x fp6,fp2
>         fdmul.x fp7,fp4
> ...
> 
> 
> And with the new option:
> 
> m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars
> 
> _transformVector:
>         link.w a5,#0
>         move.l (16,a5),a1
>         move.l (12,a5),a0
>         fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
>         movem.l a6/a4/a3/a2,-(sp)
>         move.l (8,a5),a2
>         lea (21600,a1),a6
> .L2:
>         fdmove.d (a2),fp2
>         lea (24,a2),a4
>         lea (24,a1),a3
>         fdmove.d (8,a2),fp0
>         move.l a4,a2
>         fdmove.x fp2,fp3
>         fdmove.x fp0,fp5
>         fdmul.d (a0),fp3
>         fdmul.d (8,a0),fp5
> ...
> 
> Btw: the code is not platform specific -> guess it's generally useful
> 
> Thanks
> Stefan

Ok, it's working now as it should. So if someone else needs to invent a loop at ira(), insert it here:

  df_analyze ();

  if (flag_prune_stack_vars)
    init_prune_stack_vars ();
  do
    {
      init_reg_equiv ();
...
      ira_color ();
    }
  while (flag_prune_stack_vars && prune_stack_vars ());

And if you modify something in your look, you need to do this inside your function:

  if (touched)
    {
      /* make stats visible. */
      if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
	calculate_allocation_cost ();

      /* the lifetime of all registers must be reconsidered - reset what's needed. */
      regstat_free_n_sets_and_refs ();
      regstat_free_ri ();
      loop_optimizer_finalize ();
      free_dominance_info (CDI_DOMINATORS);

/* plus restore the REG_EQUAL notes which were recorded during init_prune_stack_vars() ! */
      std::vector<std::pair<rtx_insn *, rtx> >::iterator i2r = insn2req_equals.begin();
      for (;i2r != insn2req_equals.end(); ++i2r)
	{
	  rtx_insn * insn = i2r->first;
	  REG_NOTES (insn) = i2r->second;
	}

      df_mark_solutions_dirty();
      df_analyze ();
    }

Maybe I should provide it as a patch?