Hi, I'm modifying my previous GCC port (16-bit MCU) to extend Pmode from HI to PSI. My backend finally seems to work (for what I've seen so far...). The problem now is about optimizations, and especially loop optimizations... Is there some resources explaining how GCC generates the first RTL from Gimple, why some alternatives fail, and what should be done to get what we expect ? For instance, a concrete example with this code: uint32_t M1[16], M2[16]; int i; for (i=0, i<16; i++) M1[i] = M2[i]; The generated code is quite horrible... GCC is messed up with SI/PSI/HI trunc and extend, and keeps 2 variables, 1 incremented by 1 and shifted 2 for addresses, 1 decremented from 16 to 0 for the loop... I would like GCC to do what it does before I extend Pmode, i.e. extend "i" from HI to PSI, use it to address M1 and M2, increment by sizeof(uint32_t), and test if i != 16*sizeof(uint32_t)... So is this problem related to md file? What should I look at to find such optimization problems? Thanks, Aurélien