Re: x86 gcc lacks simple optimization

Richard Biener <richard.guenther@xxxxxxxxx> · Fri, 6 Dec 2013 11:10:27 +0100

On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov
<konstantin.vladimirov@xxxxxxxxx> wrote:
> Hi,
>
> Consider code:
>
> int foo(char *t, char *v, int w)
> {
> int i;
>
> for (i = 1; i != w; ++i)
> {
> int x = i << 2;
> v[x + 4] = t[x + 4];
> }
>
> return 0;
> }
>
> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:
>
> gcc -O2 -m32 -S test.c
>
> You will see loop, formed like:
>
> .L5:
> leal 0(,%eax,4), %edx
> addl $1, %eax
> movzbl 4(%edi,%edx), %ecx
> cmpl %ebx, %eax
> movb %cl, 4(%esi,%edx)
> jne .L5
>
> But it can be easily simplified to something like this:
>
> .L5:
> addl $1, %eax
> movzbl (%esi,%eax,4), %edx
> cmpl %ecx, %eax
> movb %dl, (%ebx,%eax,4)
> jne .L5
>
> (i.e. left shift may be moved to address).
>
> First question to gcc-help maillist. May be there are some options,
> that I've missed, and there IS a way to explain gcc my intention to do
> this?
>
> And second question to gcc developers mail list. I am working on
> private backend and want to add this optimization to my backend. What
> do you advise me to do -- custom gimple pass, or rtl pass, or modify
> some existent pass, etc?

This looks like a deficiency in induction variable optimization.  Note
that i << 2 may overflow and this overflow does not invoke undefined
behavior but is in the implementation defined behavior category.

The issue in this case is likely that the SCEV infrastructure does not handle
left-shifts.

Richard.

> ---
> With best regards, Konstantin