Hi, Example from x86 code was only for ease of reproduction. I am pretty sure, this is architecture-independent issue. Say on ARM: .L2: mov ip, r3, asl #2 add ip, ip, #4 add r3, r3, #1 ldrb r4, [r0, ip] @ zero_extendqisi2 cmp r3, r2 strb r4, [r1, ip] bne .L2 May be improved to: .L2: add r3, r3, #1 ldrb ip, [r0, r3, asl #2] @ zero_extendqisi2 cmp r3, r2 strb ip, [r1, r3, asl #2] bne .L2 And so on. I myself feeling more comfortable with x86, but it is only a matter of taste. To get improved version of code, I just do by hands what compiler is expected to do automatically, i.e. rewritten things as: int foo(char *t, char *v, int w) { int i; for (i = 1; i != w; ++i) { v[(i + 1) << 2] = t[(i + 1) << 2]; } return 0; } Private backend, I am working on isn't a modification of any, it is private backend, written from scratch. --- With best regards, Konstantin On Fri, Dec 6, 2013 at 1:27 PM, David Brown <david@xxxxxxxxxxxxxxx> wrote: > On 06/12/13 09:30, Konstantin Vladimirov wrote: >> Hi, >> >> Consider code: >> >> int foo(char *t, char *v, int w) >> { >> int i; >> >> for (i = 1; i != w; ++i) >> { >> int x = i << 2; >> v[x + 4] = t[x + 4]; >> } >> >> return 0; >> } >> >> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: >> >> gcc -O2 -m32 -S test.c >> >> You will see loop, formed like: >> >> .L5: >> leal 0(,%eax,4), %edx >> addl $1, %eax >> movzbl 4(%edi,%edx), %ecx >> cmpl %ebx, %eax >> movb %cl, 4(%esi,%edx) >> jne .L5 >> >> But it can be easily simplified to something like this: >> >> .L5: >> addl $1, %eax >> movzbl (%esi,%eax,4), %edx >> cmpl %ecx, %eax >> movb %dl, (%ebx,%eax,4) >> jne .L5 >> >> (i.e. left shift may be moved to address). >> >> First question to gcc-help maillist. May be there are some options, >> that I've missed, and there IS a way to explain gcc my intention to do >> this? >> >> And second question to gcc developers mail list. I am working on >> private backend and want to add this optimization to my backend. What >> do you advise me to do -- custom gimple pass, or rtl pass, or modify >> some existent pass, etc? >> > > Hi, > > Usually the gcc developers are not keen on emails going to both the help > and development list - they prefer to keep them separate. > > My first thought when someone finds a "missed optimisation" issue, > especially with the x86 target, is are you /sure/ this code is slower? > x86 chips are immensely complex, and the interplay between different > instructions, pipelines, superscaling, etc., means that code that might > appear faster, can actually be slower. So please check your > architecture flags (i.e., are you optimising for the "native" cpu, or > any other specific cpu - optimised code can be different for different > x86 cpus). Then /measure/ the speed of the code to see if there is a > real difference. > > > Regarding your "private backend" - is this a modification of the x86 > backend, or a completely different target? If it is x86, then I think > the answer is "don't do it - work with the mainline code". If it is > something else, then an x86-specific optimisation is of little use anyway. > > mvh., > > David > > >