Re: x86 gcc lacks simple optimization

Konstantin Vladimirov <konstantin.vladimirov@xxxxxxxxx> · Fri, 6 Dec 2013 13:40:00 +0400

Hi,

Example from x86 code was only for ease of reproduction. I am pretty
sure, this is architecture-independent issue. Say on ARM:

.L2:
mov ip, r3, asl #2
add ip, ip, #4
add r3, r3, #1
ldrb r4, [r0, ip] @ zero_extendqisi2
cmp r3, r2
strb r4, [r1, ip]
bne .L2

May be improved to:

.L2:
add r3, r3, #1
ldrb ip, [r0, r3, asl #2] @ zero_extendqisi2
cmp r3, r2
strb ip, [r1, r3, asl #2]
bne .L2

And so on. I myself feeling more comfortable with x86, but it is only
a matter of taste.

To get improved version of code, I just do by hands what compiler is
expected to do automatically, i.e. rewritten things as:

int foo(char *t, char *v, int w)
{
int i;

for (i = 1; i != w; ++i)
{
v[(i + 1) << 2] = t[(i + 1) << 2];
}

return 0;
}

Private backend, I am working on isn't a modification of any, it is
private backend, written from scratch.

---
With best regards, Konstantin

On Fri, Dec 6, 2013 at 1:27 PM, David Brown <david@xxxxxxxxxxxxxxx> wrote:
> On 06/12/13 09:30, Konstantin Vladimirov wrote:
>> Hi,
>>
>> Consider code:
>>
>> int foo(char *t, char *v, int w)
>> {
>> int i;
>>
>> for (i = 1; i != w; ++i)
>> {
>> int x = i << 2;
>> v[x + 4] = t[x + 4];
>> }
>>
>> return 0;
>> }
>>
>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:
>>
>> gcc -O2 -m32 -S test.c
>>
>> You will see loop, formed like:
>>
>> .L5:
>> leal 0(,%eax,4), %edx
>> addl $1, %eax
>> movzbl 4(%edi,%edx), %ecx
>> cmpl %ebx, %eax
>> movb %cl, 4(%esi,%edx)
>> jne .L5
>>
>> But it can be easily simplified to something like this:
>>
>> .L5:
>> addl $1, %eax
>> movzbl (%esi,%eax,4), %edx
>> cmpl %ecx, %eax
>> movb %dl, (%ebx,%eax,4)
>> jne .L5
>>
>> (i.e. left shift may be moved to address).
>>
>> First question to gcc-help maillist. May be there are some options,
>> that I've missed, and there IS a way to explain gcc my intention to do
>> this?
>>
>> And second question to gcc developers mail list. I am working on
>> private backend and want to add this optimization to my backend. What
>> do you advise me to do -- custom gimple pass, or rtl pass, or modify
>> some existent pass, etc?
>>
>
> Hi,
>
> Usually the gcc developers are not keen on emails going to both the help
> and development list - they prefer to keep them separate.
>
> My first thought when someone finds a "missed optimisation" issue,
> especially with the x86 target, is are you /sure/ this code is slower?
> x86 chips are immensely complex, and the interplay between different
> instructions, pipelines, superscaling, etc., means that code that might
> appear faster, can actually be slower.  So please check your
> architecture flags (i.e., are you optimising for the "native" cpu, or
> any other specific cpu - optimised code can be different for different
> x86 cpus).  Then /measure/ the speed of the code to see if there is a
> real difference.
>
>
> Regarding your "private backend" - is this a modification of the x86
> backend, or a completely different target?  If it is x86, then I think
> the answer is "don't do it - work with the mainline code".  If it is
> something else, then an x86-specific optimisation is of little use anyway.
>
> mvh.,
>
> David
>
>
>