> -----Ursprüngliche Nachricht----- > Von: gcc-help-owner@xxxxxxxxxxx <gcc-help-owner@xxxxxxxxxxx> Im > Auftrag von stefan@xxxxxxxxx > Gesendet: Freitag, 13. September 2019 14:59 > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx> > Cc: gcc-help@xxxxxxxxxxx > Betreff: AW: new ira optimization - adding a loop to ira > > > -----Ursprüngliche Nachricht----- > > Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx> > > Gesendet: Freitag, 13. September 2019 12:58 > > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx> > > Cc: gcc-help@xxxxxxxxxxx > > Betreff: AW: new ira optimization - adding a loop to ira > > > > > -----Ursprüngliche Nachricht----- > > > Von: stefan@xxxxxxxxx <stefan@xxxxxxxxx> > > > Gesendet: Freitag, 13. September 2019 12:45 > > > An: 'Richard Sandiford' <richard.sandiford@xxxxxxx> > > > Cc: gcc-help@xxxxxxxxxxx > > > Betreff: AW: new ira optimization - adding a loop to ira > > > > > > > -----Ursprüngliche Nachricht----- > > > > Von: Richard Sandiford <richard.sandiford@xxxxxxx> > > > > Gesendet: Freitag, 13. September 2019 12:16 > > > > An: stefan@xxxxxxxxx > > > > Cc: gcc-help@xxxxxxxxxxx > > > > Betreff: Re: new ira optimization - adding a loop to ira > > > > > > > > <stefan@xxxxxxxxx> writes: > > > > > I'm working on a new optimization to get rid of spilled tmp > > > > > variables > > > (e.g. > > > > > introduced by pre) to use the source mem ref instead of a stack > > slot. > > > > > > > > > > To do this, I added a loop into ira.c:ira() > > > > > > > > > > init_prune_stack_vars (); > > > > > do > > > > > { > > > > > #ifndef IRA_NO_OBSTACK > > > > > gcc_obstack_init (&ira_obstack); #endif > > > > > bitmap_obstack_initialize (&ira_bitmap_obstack); > > > > > > > > > > ... > > > > > > > > > > ira_color (); > > > > > > > > > > } > > > > > while (flag_prune_stack_vars && prune_stack_vars ()); > > > > > > > > > > To get it work, the prune_stack_vars function resets a couple of > > data. > > > > > This is mostly working - but on some source files, it fails due > > > > > to invalid reg_equivs. > > > > > Since this also happens, if the optimizer does nothing and just > > > > > loops > > > once. > > > > > > > > > > Currently I'm calling this, before looping again > > > > > > > > > > regstat_free_n_sets_and_refs (); > > > > > regstat_free_ri (); > > > > > loop_optimizer_finalize (); > > > > > free_dominance_info (CDI_DOMINATORS); > > > > > > > > > > Any hint, what I'm missing to reset? > > > > > > > > I can't see anything obviously missing. What kind of failure do > > > > you > > > see? E.g. > > > > do you get an internal compiler error or does the compiler > > > > generate incorrect code? > > > > > > > > Do you see the failure on an in-tree test case? FWIW, I just > > > > tried > > > looping like > > > > this locally and didn't see any failures for the tests I tried. > > > > But I > > > was obviously > > > > testing without the new optimisation, and so each loop iteration > > > > should > > > just > > > > repeat what the previous one did. > > > > > > > > Not related to the failure, but: do you do anything with the > > > > obstacks > > > when > > > > looping again? Including the initialisations in the loop as above > > > > would introduce a memory leak if you don't do anything to free the > > > contents. > > > > It'd probably be better to initialise outside the loop unless > > > > you're > > > really > > > > confident that the no data is carried across iterations. > > > > > > > > Thanks, > > > > Richard > > > > > > Thanks für the ira_obstack hint - I will take care of this, once the > > loop mode > > > is working - maybe I can start looping later or I'll free the memory. > > > > > > In reload: push_reload(...) this raises an error: > > > > > > gcc_assert (regno < FIRST_PSEUDO_REGISTER > > > || reg_renumber[regno] >= 0 > > > || reg_equiv_constant (regno) == NULL_RTX); > > > > > > I already know that it's reg_equiv_constant and that this > > reg_equiv_constant > > > is also set in the unpatched code. > > > > > > So I am looking why these additional reloads occur. There are > > > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ... > > > > > > Thanks, > > > Stefan > > > > > > The difference is the additional expr_list, which causes the reload: > > > > (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46]) > > (mem/f/c:SI (plus:SI (reg/f:SI 15 sp) > > (const_int 16 [0x10])) [178 this+0 S4 A16])) > > engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2} > > (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp) > > (const_int 16 [0x10])) [178 this+0 S4 A16]) > > (nil))) > > > > => I'll add some code to drop the expr_list from all insns... > > I took the wrong corner: > > A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This > sets the req_equiv and causes the failure during reload... > > => I added code to record all insn/REG_EQUAL-note pairs => and restore > these if the loop is run again - dropping the REQ_EQUIV notes. > > And this issue went aways. > > Plus I moved the loop start further below, so the ira_obstack is only > initialized once: > > init_prune_stack_vars (); > do > { > init_reg_equiv (); > > > => I can continue to work on the optimizer itself. > > To provide an example: > > void transformVector( double* restrict inputVector, double const > transformMatrix[4][4],double* restrict outputVector) { > for(int k = 0; k < 900; k++) > { > double x = *inputVector++; > double y = *inputVector++; > double z = *inputVector++; > > for(int l = 0; l < 3; l++){ > double res = transformMatrix[l][0] * x; > res += transformMatrix[l][1] * y; > res += transformMatrix[l][2] * z; > res += transformMatrix[l][3]; > *outputVector++ = res; > } > } > } > > m68k-amigaos-gcc -m68080 -O3 x.c -S > > yields: > > #NO_APP > .text > .align 2 > .globl _transformVector > _transformVector: > link.w a5,#-88 > move.l (16,a5),a0 > move.l (8,a5),a1 > fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp) > movem.l a4/a3/a2,-(sp) > move.l (12,a5),a2 > move.l (a2)+,(-16,a5) > move.l (a2)+,(-12,a5) > lea (21600,a0),a4 > fdmove.d (a2)+,fp7 > move.l (a2)+,(-8,a5) > move.l (a2)+,(-4,a5) > move.l (a2)+,(-24,a5) > move.l (a2)+,(-20,a5) > move.l (a2)+,(-32,a5) > move.l (a2)+,(-28,a5) > move.l (a2)+,(-40,a5) > move.l (a2)+,(-36,a5) > move.l (a2)+,(-48,a5) > move.l (a2)+,(-44,a5) > move.l (a2)+,(-56,a5) > move.l (a2)+,(-52,a5) > move.l (a2)+,(-64,a5) > move.l (a2)+,(-60,a5) > move.l (a2)+,(-72,a5) > move.l (a2)+,(-68,a5) > move.l (a2)+,(-80,a5) > move.l (a2)+,(-76,a5) > move.l (a2),(-88,a5) > move.l (4,a2),(-84,a5) > .L2: > fdmove.d (8,a1),fp0 > lea (24,a1),a3 > lea (24,a0),a2 > fdmove.d (a1),fp6 > move.l a3,a1 > fdmove.x fp0,fp4 > fdmove.d (-16,a5),fp2 > fdmul.x fp6,fp2 > fdmul.x fp7,fp4 > ... > > > And with the new option: > > m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars > > _transformVector: > link.w a5,#0 > move.l (16,a5),a1 > move.l (12,a5),a0 > fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp) > movem.l a6/a4/a3/a2,-(sp) > move.l (8,a5),a2 > lea (21600,a1),a6 > .L2: > fdmove.d (a2),fp2 > lea (24,a2),a4 > lea (24,a1),a3 > fdmove.d (8,a2),fp0 > move.l a4,a2 > fdmove.x fp2,fp3 > fdmove.x fp0,fp5 > fdmul.d (a0),fp3 > fdmul.d (8,a0),fp5 > ... > > Btw: the code is not platform specific -> guess it's generally useful > > Thanks > Stefan Ok, it's working now as it should. So if someone else needs to invent a loop at ira(), insert it here: df_analyze (); if (flag_prune_stack_vars) init_prune_stack_vars (); do { init_reg_equiv (); ... ira_color (); } while (flag_prune_stack_vars && prune_stack_vars ()); And if you modify something in your look, you need to do this inside your function: if (touched) { /* make stats visible. */ if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL) calculate_allocation_cost (); /* the lifetime of all registers must be reconsidered - reset what's needed. */ regstat_free_n_sets_and_refs (); regstat_free_ri (); loop_optimizer_finalize (); free_dominance_info (CDI_DOMINATORS); /* plus restore the REG_EQUAL notes which were recorded during init_prune_stack_vars() ! */ std::vector<std::pair<rtx_insn *, rtx> >::iterator i2r = insn2req_equals.begin(); for (;i2r != insn2req_equals.end(); ++i2r) { rtx_insn * insn = i2r->first; REG_NOTES (insn) = i2r->second; } df_mark_solutions_dirty(); df_analyze (); } Maybe I should provide it as a patch?