Re: x86 gcc lacks simple optimization

Richard Biener <richard.guenther@xxxxxxxxx> · Fri, 6 Dec 2013 15:17:14 +0100

On Fri, Dec 6, 2013 at 2:52 PM, Konstantin Vladimirov
<konstantin.vladimirov@xxxxxxxxx> wrote:
> Hi,
>
> Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c
> and now it yields code like (x86 again):
>
> .L5:
> movzbl 4(%esi,%eax,4), %edx
> movb %dl, 4(%ebx,%eax,4)
> addl $1, %eax
> cmpl %ecx, %eax
> jne .L5
>
> So, excessive lea is gone. It is great, thank you so much. But I
> wonder what else can I do to move add upper to simplify memory
> accesses (I am guessing, this is some arithmetical re-associations,
> still not sure where to look). For architecture, I am working on, it
> is important. What would you advise?

You need to look at IVOPTs and how it arrives at the choice of
induction variables.

Richard.

> ---
> With best regards, Konstantin
>
> On Fri, Dec 6, 2013 at 2:25 PM, Richard Biener
> <richard.guenther@xxxxxxxxx> wrote:
>> On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov
>> <konstantin.vladimirov@xxxxxxxxx> wrote:
>>> Hi,
>>>
>>> nothing changes if everything is unsigned and we are guaranteed to not
>>> raise UB on overflow:
>>>
>>> unsigned foo(unsigned char *t, unsigned char *v, unsigned w)
>>> {
>>> unsigned i;
>>>
>>> for (i = 1; i != w; ++i)
>>> {
>>> unsigned x = i << 2;
>>> v[x + 4] = t[x + 4];
>>> }
>>>
>>> return 0;
>>> }
>>>
>>> yields:
>>>
>>> .L5:
>>> leal 0(,%eax,4), %edx
>>> addl $1, %eax
>>> movzbl 4(%edi,%edx), %ecx
>>> cmpl %ebx, %eax
>>> movb %cl, 4(%esi,%edx)
>>> jne .L5
>>>
>>> What is SCEV infrastructure (guessing scalar evolutions?) and what
>>> files/passes to look in?
>>
>> tree-scalar-evolution.c, look at where it handles MULT_EXPR but
>> lacks LSHIFT_EXPR support.
>>
>> Richard.
>>
>>> ---
>>> With best regards, Konstantin
>>>
>>> On Fri, Dec 6, 2013 at 2:10 PM, Richard Biener
>>> <richard.guenther@xxxxxxxxx> wrote:
>>>> On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov
>>>> <konstantin.vladimirov@xxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> Consider code:
>>>>>
>>>>> int foo(char *t, char *v, int w)
>>>>> {
>>>>> int i;
>>>>>
>>>>> for (i = 1; i != w; ++i)
>>>>> {
>>>>> int x = i << 2;
>>>>> v[x + 4] = t[x + 4];
>>>>> }
>>>>>
>>>>> return 0;
>>>>> }
>>>>>
>>>>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:
>>>>>
>>>>> gcc -O2 -m32 -S test.c
>>>>>
>>>>> You will see loop, formed like:
>>>>>
>>>>> .L5:
>>>>> leal 0(,%eax,4), %edx
>>>>> addl $1, %eax
>>>>> movzbl 4(%edi,%edx), %ecx
>>>>> cmpl %ebx, %eax
>>>>> movb %cl, 4(%esi,%edx)
>>>>> jne .L5
>>>>>
>>>>> But it can be easily simplified to something like this:
>>>>>
>>>>> .L5:
>>>>> addl $1, %eax
>>>>> movzbl (%esi,%eax,4), %edx
>>>>> cmpl %ecx, %eax
>>>>> movb %dl, (%ebx,%eax,4)
>>>>> jne .L5
>>>>>
>>>>> (i.e. left shift may be moved to address).
>>>>>
>>>>> First question to gcc-help maillist. May be there are some options,
>>>>> that I've missed, and there IS a way to explain gcc my intention to do
>>>>> this?
>>>>>
>>>>> And second question to gcc developers mail list. I am working on
>>>>> private backend and want to add this optimization to my backend. What
>>>>> do you advise me to do -- custom gimple pass, or rtl pass, or modify
>>>>> some existent pass, etc?
>>>>
>>>> This looks like a deficiency in induction variable optimization.  Note
>>>> that i << 2 may overflow and this overflow does not invoke undefined
>>>> behavior but is in the implementation defined behavior category.
>>>>
>>>> The issue in this case is likely that the SCEV infrastructure does not handle
>>>> left-shifts.
>>>>
>>>> Richard.
>>>>
>>>>> ---
>>>>> With best regards, Konstantin