x86 gcc lacks simple optimization

Konstantin Vladimirov <konstantin.vladimirov@xxxxxxxxx> · Fri, 6 Dec 2013 12:30:54 +0400

Hi,

Consider code:

int foo(char *t, char *v, int w)
{
int i;

for (i = 1; i != w; ++i)
{
int x = i << 2;
v[x + 4] = t[x + 4];
}

return 0;
}

Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:

gcc -O2 -m32 -S test.c

You will see loop, formed like:

.L5:
leal 0(,%eax,4), %edx
addl $1, %eax
movzbl 4(%edi,%edx), %ecx
cmpl %ebx, %eax
movb %cl, 4(%esi,%edx)
jne .L5

But it can be easily simplified to something like this:

.L5:
addl $1, %eax
movzbl (%esi,%eax,4), %edx
cmpl %ecx, %eax
movb %dl, (%ebx,%eax,4)
jne .L5

(i.e. left shift may be moved to address).

First question to gcc-help maillist. May be there are some options,
that I've missed, and there IS a way to explain gcc my intention to do
this?

And second question to gcc developers mail list. I am working on
private backend and want to add this optimization to my backend. What
do you advise me to do -- custom gimple pass, or rtl pass, or modify
some existent pass, etc?

---
With best regards, Konstantin