2011/7/19 Ian Lance Taylor <iant@xxxxxxxxxx>: > Aurelien Buhrig <aurelien.buhrig.gcc@xxxxxxxxx> writes: > >> Is there some resources explaining how GCC generates the first RTL >> from Gimple, why some alternatives fail, and what should be done to >> get what we expect ? > > Nothing very specific, no. > >> For instance, a concrete example with this code: >> >> uint32_t M1[16], M2[16]; >> >> int i; >> for (i=0, i<16; i++) >> M1[i] = M2[i]; >> >> The generated code is quite horrible... GCC is messed up with >> SI/PSI/HI trunc and extend, and keeps 2 variables, 1 incremented by 1 >> and shifted 2 for addresses, 1 decremented from 16 to 0 for the >> loop... >> I would like GCC to do what it does before I extend Pmode, i.e. extend >> "i" from HI to PSI, use it to address M1 and M2, increment by >> sizeof(uint32_t), and test if i != 16*sizeof(uint32_t)... >> >> So is this problem related to md file? >> What should I look at to find such optimization problems? > > I don't see offhand why this would be related to RTL expansion. Look at > the dump files. What does the code look like in the last tree dump? > Look at the dump files for the loop passes in particular; what kind of > optimizations do they see or fail to see? > last tree dump (final_cleanup) looks like: <bb 3>: M2[i] = M1[i]; i = i + 1; ivtmp.6 = ivtmp.6 - 1; if (ivtmp.6 != 0) goto <bb 3>; else goto <bb 4>; <bb 4>: Whereas when Pmode was HImode, It looked like: <bb 3>: MEM[base: &M2 + ivtmp.13] = MEM[base: &M1 + ivtmp.13]; ivtmp.13 = ivtmp.13 + 2; if (ivtmp.13 != 32) goto <bb 3>; else goto <bb 4>; <bb 4>: which enable better code generation... The first differences between both versions appear from induction variable optimisations pass (ivopts). The new port with Pmode=PSImode dump is ;; Function main (main) main () { unsigned int ivtmp.6; unsigned int i; long unsigned int D.1223; <bb 2>: <bb 3>: # ivtmp.6_1 = PHI <ivtmp.6_9(4), 16(2)> # i_15 = PHI <i_6(4), 0(2)> D.1223_5 = M1[i_15]; M2[i_15] = D.1223_5; i_6 = i_15 + 1; ivtmp.6_9 = ivtmp.6_1 - 1; if (ivtmp.6_9 != 0) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: return 0; } while the old port with PMode=HImode is: ;; Function main (main) main () { uint32_t[16] * D.1243; uint32_t[16] * D.1242; unsigned int ivtmp.13; unsigned int ivtmp.6; unsigned int i; long unsigned int D.1223; <bb 2>: <bb 3>: # ivtmp.13_14 = PHI <ivtmp.13_13(4), 0(2)> D.1242_17 = &M1 + ivtmp.13_14; D.1223_5 = MEM[base: D.1242_17]; D.1243_18 = &M2 + ivtmp.13_14; MEM[base: D.1243_18] = D.1223_5; ivtmp.13_13 = ivtmp.13_14 + 4; if (ivtmp.13_13 != 64) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: return 0; } Perhaps GCC is not able to map D.1242_17 = &M1 + ivtmp.13_14; (line from old port) because $M1 and ivtmp.13_14 are not HImode anymore (but PSI and HI) ? Is there something I can do to optimize this ? And generally speaking, is the tree generation influenced by the RTL description in any way ? Thanks! Aurélien