Hi, Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c and now it yields code like (x86 again): .L5: movzbl 4(%esi,%eax,4), %edx movb %dl, 4(%ebx,%eax,4) addl $1, %eax cmpl %ecx, %eax jne .L5 So, excessive lea is gone. It is great, thank you so much. But I wonder what else can I do to move add upper to simplify memory accesses (I am guessing, this is some arithmetical re-associations, still not sure where to look). For architecture, I am working on, it is important. What would you advise? --- With best regards, Konstantin On Fri, Dec 6, 2013 at 2:25 PM, Richard Biener <richard.guenther@xxxxxxxxx> wrote: > On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov > <konstantin.vladimirov@xxxxxxxxx> wrote: >> Hi, >> >> nothing changes if everything is unsigned and we are guaranteed to not >> raise UB on overflow: >> >> unsigned foo(unsigned char *t, unsigned char *v, unsigned w) >> { >> unsigned i; >> >> for (i = 1; i != w; ++i) >> { >> unsigned x = i << 2; >> v[x + 4] = t[x + 4]; >> } >> >> return 0; >> } >> >> yields: >> >> .L5: >> leal 0(,%eax,4), %edx >> addl $1, %eax >> movzbl 4(%edi,%edx), %ecx >> cmpl %ebx, %eax >> movb %cl, 4(%esi,%edx) >> jne .L5 >> >> What is SCEV infrastructure (guessing scalar evolutions?) and what >> files/passes to look in? > > tree-scalar-evolution.c, look at where it handles MULT_EXPR but > lacks LSHIFT_EXPR support. > > Richard. > >> --- >> With best regards, Konstantin >> >> On Fri, Dec 6, 2013 at 2:10 PM, Richard Biener >> <richard.guenther@xxxxxxxxx> wrote: >>> On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov >>> <konstantin.vladimirov@xxxxxxxxx> wrote: >>>> Hi, >>>> >>>> Consider code: >>>> >>>> int foo(char *t, char *v, int w) >>>> { >>>> int i; >>>> >>>> for (i = 1; i != w; ++i) >>>> { >>>> int x = i << 2; >>>> v[x + 4] = t[x + 4]; >>>> } >>>> >>>> return 0; >>>> } >>>> >>>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: >>>> >>>> gcc -O2 -m32 -S test.c >>>> >>>> You will see loop, formed like: >>>> >>>> .L5: >>>> leal 0(,%eax,4), %edx >>>> addl $1, %eax >>>> movzbl 4(%edi,%edx), %ecx >>>> cmpl %ebx, %eax >>>> movb %cl, 4(%esi,%edx) >>>> jne .L5 >>>> >>>> But it can be easily simplified to something like this: >>>> >>>> .L5: >>>> addl $1, %eax >>>> movzbl (%esi,%eax,4), %edx >>>> cmpl %ecx, %eax >>>> movb %dl, (%ebx,%eax,4) >>>> jne .L5 >>>> >>>> (i.e. left shift may be moved to address). >>>> >>>> First question to gcc-help maillist. May be there are some options, >>>> that I've missed, and there IS a way to explain gcc my intention to do >>>> this? >>>> >>>> And second question to gcc developers mail list. I am working on >>>> private backend and want to add this optimization to my backend. What >>>> do you advise me to do -- custom gimple pass, or rtl pass, or modify >>>> some existent pass, etc? >>> >>> This looks like a deficiency in induction variable optimization. Note >>> that i << 2 may overflow and this overflow does not invoke undefined >>> behavior but is in the implementation defined behavior category. >>> >>> The issue in this case is likely that the SCEV infrastructure does not handle >>> left-shifts. >>> >>> Richard. >>> >>>> --- >>>> With best regards, Konstantin