> -----Ursprüngliche Nachricht----- > Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx> > Gesendet: Freitag, 13. September 2019 12:58 > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx> > Cc: gcc-help@xxxxxxxxxxx > Betreff: AW: new ira optimization - adding a loop to ira > > > -----Ursprüngliche Nachricht----- > > Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx> > > Gesendet: Freitag, 13. September 2019 12:45 > > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx> > > Cc: gcc-help@xxxxxxxxxxx > > Betreff: AW: new ira optimization - adding a loop to ira > > > > > -----Ursprüngliche Nachricht----- > > > Von: Richard Sandiford <richard.sandiford@xxxxxxx> > > > Gesendet: Freitag, 13. September 2019 12:16 > > > An: stefan@xxxxxxxxx > > > Cc: gcc-help@xxxxxxxxxxx > > > Betreff: Re: new ira optimization - adding a loop to ira > > > > > > <stefan@xxxxxxxxx> writes: > > > > I'm working on a new optimization to get rid of spilled tmp > > > > variables > > (e.g. > > > > introduced by pre) to use the source mem ref instead of a stack > slot. > > > > > > > > To do this, I added a loop into ira.c:ira() > > > > > > > > init_prune_stack_vars (); > > > > do > > > > { > > > > #ifndef IRA_NO_OBSTACK > > > > gcc_obstack_init (&ira_obstack); #endif > > > > bitmap_obstack_initialize (&ira_bitmap_obstack); > > > > > > > > ... > > > > > > > > ira_color (); > > > > > > > > } > > > > while (flag_prune_stack_vars && prune_stack_vars ()); > > > > > > > > To get it work, the prune_stack_vars function resets a couple of > data. > > > > This is mostly working - but on some source files, it fails due to > > > > invalid reg_equivs. > > > > Since this also happens, if the optimizer does nothing and just > > > > loops > > once. > > > > > > > > Currently I'm calling this, before looping again > > > > > > > > regstat_free_n_sets_and_refs (); > > > > regstat_free_ri (); > > > > loop_optimizer_finalize (); > > > > free_dominance_info (CDI_DOMINATORS); > > > > > > > > Any hint, what I'm missing to reset? > > > > > > I can't see anything obviously missing. What kind of failure do you > > see? E.g. > > > do you get an internal compiler error or does the compiler generate > > > incorrect code? > > > > > > Do you see the failure on an in-tree test case? FWIW, I just tried > > looping like > > > this locally and didn't see any failures for the tests I tried. But > > > I > > was obviously > > > testing without the new optimisation, and so each loop iteration > > > should > > just > > > repeat what the previous one did. > > > > > > Not related to the failure, but: do you do anything with the > > > obstacks > > when > > > looping again? Including the initialisations in the loop as above > > > would introduce a memory leak if you don't do anything to free the > > contents. > > > It'd probably be better to initialise outside the loop unless you're > > really > > > confident that the no data is carried across iterations. > > > > > > Thanks, > > > Richard > > > > Thanks für the ira_obstack hint - I will take care of this, once the > loop mode > > is working - maybe I can start looping later or I'll free the memory. > > > > In reload: push_reload(...) this raises an error: > > > > gcc_assert (regno < FIRST_PSEUDO_REGISTER > > || reg_renumber[regno] >= 0 > > || reg_equiv_constant (regno) == NULL_RTX); > > > > I already know that it's reg_equiv_constant and that this > reg_equiv_constant > > is also set in the unpatched code. > > > > So I am looking why these additional reloads occur. There are > > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ... > > > > Thanks, > > Stefan > > > The difference is the additional expr_list, which causes the reload: > > (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46]) > (mem/f/c:SI (plus:SI (reg/f:SI 15 sp) > (const_int 16 [0x10])) [178 this+0 S4 A16])) > engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2} > (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp) > (const_int 16 [0x10])) [178 this+0 S4 A16]) > (nil))) > > => I'll add some code to drop the expr_list from all insns... I took the wrong corner: A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This sets the req_equiv and causes the failure during reload... => I added code to record all insn/REG_EQUAL-note pairs => and restore these if the loop is run again - dropping the REQ_EQUIV notes. And this issue went aways. Plus I moved the loop start further below, so the ira_obstack is only initialized once: init_prune_stack_vars (); do { init_reg_equiv (); => I can continue to work on the optimizer itself. To provide an example: void transformVector( double* restrict inputVector, double const transformMatrix[4][4],double* restrict outputVector) { for(int k = 0; k < 900; k++) { double x = *inputVector++; double y = *inputVector++; double z = *inputVector++; for(int l = 0; l < 3; l++){ double res = transformMatrix[l][0] * x; res += transformMatrix[l][1] * y; res += transformMatrix[l][2] * z; res += transformMatrix[l][3]; *outputVector++ = res; } } } m68k-amigaos-gcc -m68080 -O3 x.c -S yields: #NO_APP .text .align 2 .globl _transformVector _transformVector: link.w a5,#-88 move.l (16,a5),a0 move.l (8,a5),a1 fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp) movem.l a4/a3/a2,-(sp) move.l (12,a5),a2 move.l (a2)+,(-16,a5) move.l (a2)+,(-12,a5) lea (21600,a0),a4 fdmove.d (a2)+,fp7 move.l (a2)+,(-8,a5) move.l (a2)+,(-4,a5) move.l (a2)+,(-24,a5) move.l (a2)+,(-20,a5) move.l (a2)+,(-32,a5) move.l (a2)+,(-28,a5) move.l (a2)+,(-40,a5) move.l (a2)+,(-36,a5) move.l (a2)+,(-48,a5) move.l (a2)+,(-44,a5) move.l (a2)+,(-56,a5) move.l (a2)+,(-52,a5) move.l (a2)+,(-64,a5) move.l (a2)+,(-60,a5) move.l (a2)+,(-72,a5) move.l (a2)+,(-68,a5) move.l (a2)+,(-80,a5) move.l (a2)+,(-76,a5) move.l (a2),(-88,a5) move.l (4,a2),(-84,a5) .L2: fdmove.d (8,a1),fp0 lea (24,a1),a3 lea (24,a0),a2 fdmove.d (a1),fp6 move.l a3,a1 fdmove.x fp0,fp4 fdmove.d (-16,a5),fp2 fdmul.x fp6,fp2 fdmul.x fp7,fp4 ... And with the new option: m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars _transformVector: link.w a5,#0 move.l (16,a5),a1 move.l (12,a5),a0 fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp) movem.l a6/a4/a3/a2,-(sp) move.l (8,a5),a2 lea (21600,a1),a6 .L2: fdmove.d (a2),fp2 lea (24,a2),a4 lea (24,a1),a3 fdmove.d (8,a2),fp0 move.l a4,a2 fdmove.x fp2,fp3 fdmove.x fp0,fp5 fdmul.d (a0),fp3 fdmul.d (8,a0),fp5 ... Btw: the code is not platform specific -> guess it's generally useful Thanks Stefan