Re: x86 gcc lacks simple optimization

Konstantin Vladimirov <konstantin.vladimirov@xxxxxxxxx> · Fri, 6 Dec 2013 17:52:13 +0400

Hi,

Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c
and now it yields code like (x86 again):

.L5:
movzbl 4(%esi,%eax,4), %edx
movb %dl, 4(%ebx,%eax,4)
addl $1, %eax
cmpl %ecx, %eax
jne .L5

So, excessive lea is gone. It is great, thank you so much. But I
wonder what else can I do to move add upper to simplify memory
accesses (I am guessing, this is some arithmetical re-associations,
still not sure where to look). For architecture, I am working on, it
is important. What would you advise?

---
With best regards, Konstantin

On Fri, Dec 6, 2013 at 2:25 PM, Richard Biener
<richard.guenther@xxxxxxxxx> wrote:
> On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov
> <konstantin.vladimirov@xxxxxxxxx> wrote:
>> Hi,
>>
>> nothing changes if everything is unsigned and we are guaranteed to not
>> raise UB on overflow:
>>
>> unsigned foo(unsigned char *t, unsigned char *v, unsigned w)
>> {
>> unsigned i;
>>
>> for (i = 1; i != w; ++i)
>> {
>> unsigned x = i << 2;
>> v[x + 4] = t[x + 4];
>> }
>>
>> return 0;
>> }
>>
>> yields:
>>
>> .L5:
>> leal 0(,%eax,4), %edx
>> addl $1, %eax
>> movzbl 4(%edi,%edx), %ecx
>> cmpl %ebx, %eax
>> movb %cl, 4(%esi,%edx)
>> jne .L5
>>
>> What is SCEV infrastructure (guessing scalar evolutions?) and what
>> files/passes to look in?
>
> tree-scalar-evolution.c, look at where it handles MULT_EXPR but
> lacks LSHIFT_EXPR support.
>
> Richard.
>
>> ---
>> With best regards, Konstantin
>>
>> On Fri, Dec 6, 2013 at 2:10 PM, Richard Biener
>> <richard.guenther@xxxxxxxxx> wrote:
>>> On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov
>>> <konstantin.vladimirov@xxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> Consider code:
>>>>
>>>> int foo(char *t, char *v, int w)
>>>> {
>>>> int i;
>>>>
>>>> for (i = 1; i != w; ++i)
>>>> {
>>>> int x = i << 2;
>>>> v[x + 4] = t[x + 4];
>>>> }
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:
>>>>
>>>> gcc -O2 -m32 -S test.c
>>>>
>>>> You will see loop, formed like:
>>>>
>>>> .L5:
>>>> leal 0(,%eax,4), %edx
>>>> addl $1, %eax
>>>> movzbl 4(%edi,%edx), %ecx
>>>> cmpl %ebx, %eax
>>>> movb %cl, 4(%esi,%edx)
>>>> jne .L5
>>>>
>>>> But it can be easily simplified to something like this:
>>>>
>>>> .L5:
>>>> addl $1, %eax
>>>> movzbl (%esi,%eax,4), %edx
>>>> cmpl %ecx, %eax
>>>> movb %dl, (%ebx,%eax,4)
>>>> jne .L5
>>>>
>>>> (i.e. left shift may be moved to address).
>>>>
>>>> First question to gcc-help maillist. May be there are some options,
>>>> that I've missed, and there IS a way to explain gcc my intention to do
>>>> this?
>>>>
>>>> And second question to gcc developers mail list. I am working on
>>>> private backend and want to add this optimization to my backend. What
>>>> do you advise me to do -- custom gimple pass, or rtl pass, or modify
>>>> some existent pass, etc?
>>>
>>> This looks like a deficiency in induction variable optimization.  Note
>>> that i << 2 may overflow and this overflow does not invoke undefined
>>> behavior but is in the implementation defined behavior category.
>>>
>>> The issue in this case is likely that the SCEV infrastructure does not handle
>>> left-shifts.
>>>
>>> Richard.
>>>
>>>> ---
>>>> With best regards, Konstantin