On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov <konstantin.vladimirov@xxxxxxxxx> wrote: > Hi, > > nothing changes if everything is unsigned and we are guaranteed to not > raise UB on overflow: > > unsigned foo(unsigned char *t, unsigned char *v, unsigned w) > { > unsigned i; > > for (i = 1; i != w; ++i) > { > unsigned x = i << 2; > v[x + 4] = t[x + 4]; > } > > return 0; > } > > yields: > > .L5: > leal 0(,%eax,4), %edx > addl $1, %eax > movzbl 4(%edi,%edx), %ecx > cmpl %ebx, %eax > movb %cl, 4(%esi,%edx) > jne .L5 > > What is SCEV infrastructure (guessing scalar evolutions?) and what > files/passes to look in? tree-scalar-evolution.c, look at where it handles MULT_EXPR but lacks LSHIFT_EXPR support. Richard. > --- > With best regards, Konstantin > > On Fri, Dec 6, 2013 at 2:10 PM, Richard Biener > <richard.guenther@xxxxxxxxx> wrote: >> On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov >> <konstantin.vladimirov@xxxxxxxxx> wrote: >>> Hi, >>> >>> Consider code: >>> >>> int foo(char *t, char *v, int w) >>> { >>> int i; >>> >>> for (i = 1; i != w; ++i) >>> { >>> int x = i << 2; >>> v[x + 4] = t[x + 4]; >>> } >>> >>> return 0; >>> } >>> >>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: >>> >>> gcc -O2 -m32 -S test.c >>> >>> You will see loop, formed like: >>> >>> .L5: >>> leal 0(,%eax,4), %edx >>> addl $1, %eax >>> movzbl 4(%edi,%edx), %ecx >>> cmpl %ebx, %eax >>> movb %cl, 4(%esi,%edx) >>> jne .L5 >>> >>> But it can be easily simplified to something like this: >>> >>> .L5: >>> addl $1, %eax >>> movzbl (%esi,%eax,4), %edx >>> cmpl %ecx, %eax >>> movb %dl, (%ebx,%eax,4) >>> jne .L5 >>> >>> (i.e. left shift may be moved to address). >>> >>> First question to gcc-help maillist. May be there are some options, >>> that I've missed, and there IS a way to explain gcc my intention to do >>> this? >>> >>> And second question to gcc developers mail list. I am working on >>> private backend and want to add this optimization to my backend. What >>> do you advise me to do -- custom gimple pass, or rtl pass, or modify >>> some existent pass, etc? >> >> This looks like a deficiency in induction variable optimization. Note >> that i << 2 may overflow and this overflow does not invoke undefined >> behavior but is in the implementation defined behavior category. >> >> The issue in this case is likely that the SCEV infrastructure does not handle >> left-shifts. >> >> Richard. >> >>> --- >>> With best regards, Konstantin