On Fri, Dec 6, 2013 at 2:52 PM, Konstantin Vladimirov <konstantin.vladimirov@xxxxxxxxx> wrote: > Hi, > > Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c > and now it yields code like (x86 again): > > .L5: > movzbl 4(%esi,%eax,4), %edx > movb %dl, 4(%ebx,%eax,4) > addl $1, %eax > cmpl %ecx, %eax > jne .L5 > > So, excessive lea is gone. It is great, thank you so much. But I > wonder what else can I do to move add upper to simplify memory > accesses (I am guessing, this is some arithmetical re-associations, > still not sure where to look). For architecture, I am working on, it > is important. What would you advise? You need to look at IVOPTs and how it arrives at the choice of induction variables. Richard. > --- > With best regards, Konstantin > > On Fri, Dec 6, 2013 at 2:25 PM, Richard Biener > <richard.guenther@xxxxxxxxx> wrote: >> On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov >> <konstantin.vladimirov@xxxxxxxxx> wrote: >>> Hi, >>> >>> nothing changes if everything is unsigned and we are guaranteed to not >>> raise UB on overflow: >>> >>> unsigned foo(unsigned char *t, unsigned char *v, unsigned w) >>> { >>> unsigned i; >>> >>> for (i = 1; i != w; ++i) >>> { >>> unsigned x = i << 2; >>> v[x + 4] = t[x + 4]; >>> } >>> >>> return 0; >>> } >>> >>> yields: >>> >>> .L5: >>> leal 0(,%eax,4), %edx >>> addl $1, %eax >>> movzbl 4(%edi,%edx), %ecx >>> cmpl %ebx, %eax >>> movb %cl, 4(%esi,%edx) >>> jne .L5 >>> >>> What is SCEV infrastructure (guessing scalar evolutions?) and what >>> files/passes to look in? >> >> tree-scalar-evolution.c, look at where it handles MULT_EXPR but >> lacks LSHIFT_EXPR support. >> >> Richard. >> >>> --- >>> With best regards, Konstantin >>> >>> On Fri, Dec 6, 2013 at 2:10 PM, Richard Biener >>> <richard.guenther@xxxxxxxxx> wrote: >>>> On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov >>>> <konstantin.vladimirov@xxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> Consider code: >>>>> >>>>> int foo(char *t, char *v, int w) >>>>> { >>>>> int i; >>>>> >>>>> for (i = 1; i != w; ++i) >>>>> { >>>>> int x = i << 2; >>>>> v[x + 4] = t[x + 4]; >>>>> } >>>>> >>>>> return 0; >>>>> } >>>>> >>>>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: >>>>> >>>>> gcc -O2 -m32 -S test.c >>>>> >>>>> You will see loop, formed like: >>>>> >>>>> .L5: >>>>> leal 0(,%eax,4), %edx >>>>> addl $1, %eax >>>>> movzbl 4(%edi,%edx), %ecx >>>>> cmpl %ebx, %eax >>>>> movb %cl, 4(%esi,%edx) >>>>> jne .L5 >>>>> >>>>> But it can be easily simplified to something like this: >>>>> >>>>> .L5: >>>>> addl $1, %eax >>>>> movzbl (%esi,%eax,4), %edx >>>>> cmpl %ecx, %eax >>>>> movb %dl, (%ebx,%eax,4) >>>>> jne .L5 >>>>> >>>>> (i.e. left shift may be moved to address). >>>>> >>>>> First question to gcc-help maillist. May be there are some options, >>>>> that I've missed, and there IS a way to explain gcc my intention to do >>>>> this? >>>>> >>>>> And second question to gcc developers mail list. I am working on >>>>> private backend and want to add this optimization to my backend. What >>>>> do you advise me to do -- custom gimple pass, or rtl pass, or modify >>>>> some existent pass, etc? >>>> >>>> This looks like a deficiency in induction variable optimization. Note >>>> that i << 2 may overflow and this overflow does not invoke undefined >>>> behavior but is in the implementation defined behavior category. >>>> >>>> The issue in this case is likely that the SCEV infrastructure does not handle >>>> left-shifts. >>>> >>>> Richard. >>>> >>>>> --- >>>>> With best regards, Konstantin