On 11/07/10 11:00, Justin Lebar wrote:
Hi, all.
I'm confused about an instruction gcc is generating. It looks
unnecessary to me, but also doesn't appear to be an intentional nop.
I'm no expert, so it's likely I'm misunderstanding something.
I'm using gcc Ubuntu/Linaro 4.5.1-7ubuntu2, but I get the same code
with gcc 4.4.
Here's the relevant C code:
char skip[] = { /* ... */ };
int foo(const unsigned char *str, int len)
{
int result = 0;
int i = 7;
while (i< len) {
if (str[i] == '_'&& str[i-1] == 'D') {
result |= 2;
}
i += skip[str[i]];
}
return result;
}
And here's the disassembly from gcc-4.5 at -O2 and -O3 (they're the same):
0000000000000000<foo>:
0: 31 c0 xor eax,eax
2: 83 fe 07 cmp esi,0x7
5: ba 07 00 00 00 mov edx,0x7
a: 7f 14 jg 20<foo+0x20>
c: eb 32 jmp 40<foo+0x40>
e: 66 90 xchg ax,ax
// Beginning of loop
10: 0f b6 c9 movzx ecx,cl
13: 0f be 89 00 00 00 00 movsx ecx,BYTE PTR [rcx+0x0] // 0x0
replaced by linker with addr of skip
1a: 01 ca add edx,ecx
1c: 39 d6 cmp esi,edx
1e: 7e 20 jle 40<foo+0x40>
20: 4c 63 c2 movsxd r8,edx
23: 42 0f b6 0c 07 movzx ecx,BYTE PTR [rdi+r8*1]
28: 80 f9 5f cmp cl,0x5f
2b: 75 e3 jne 10<foo+0x10>
// Likely end of loop (i.e. branch above is likely taken)
2d: 41 89 c1 mov r9d,eax
30: 41 83 c9 02 or r9d,0x2
34: 41 80 7c 38 ff 44 cmp BYTE PTR [r8+rdi*1-0x1],0x44
3a: 41 0f 44 c1 cmove eax,r9d
3e: eb d0 jmp 10<foo+0x10>
40: f3 c3 repz ret
I'm confused about line 10, |movzx ecl, cl|. As I understand, this is
truncating ecx so it's 0 everywhere except the least-significant byte.
But we can only get to line 10 if the last write to ecx was in line
23, which already truncated the top bits of the register.
If I change |str| in the C code to a signed char, then line 10 becomes
movsx, which is sensible. That the instruction has a parallel in the
signed char case suggests to me that it's probably not an intentional
The pass which typically tracks zero/nonzero information and tries to
eliminate redundant zero/sign extensions is combine which only operates
on insns found within a single basic block. In this case the
instructions at 0x10 and 0x23 are in different basic blocks and combine
makes a worst case assumption.
Jeff