Re: Unnecessary movzx instruction?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 On 11/07/10 11:00, Justin Lebar wrote:
Hi, all.

I'm confused about an instruction gcc is generating.  It looks
unnecessary to me, but also doesn't appear to be an intentional nop.
I'm no expert, so it's likely I'm misunderstanding something.

I'm using gcc Ubuntu/Linaro 4.5.1-7ubuntu2, but I get the same code
with gcc 4.4.

Here's the relevant C code:

char skip[] = { /* ... */ };

int foo(const unsigned char *str, int len)
{
   int result = 0;
   int i = 7;

   while (i<  len) {
     if (str[i] == '_'&&  str[i-1] == 'D') {
       result |= 2;
     }
     i += skip[str[i]];
   }

   return result;
}

And here's the disassembly from gcc-4.5 at -O2 and -O3 (they're the same):

0000000000000000<foo>:
    0:	31 c0                	xor    eax,eax
    2:	83 fe 07             	cmp    esi,0x7
    5:	ba 07 00 00 00       	mov    edx,0x7
    a:	7f 14                	jg     20<foo+0x20>
    c:	eb 32                	jmp    40<foo+0x40>
    e:	66 90                	xchg   ax,ax

// Beginning of loop

   10:	0f b6 c9             	movzx  ecx,cl
   13:	0f be 89 00 00 00 00 	movsx  ecx,BYTE PTR [rcx+0x0] // 0x0
replaced by linker with addr of skip
   1a:	01 ca                	add    edx,ecx
   1c:	39 d6                	cmp    esi,edx
   1e:	7e 20                	jle    40<foo+0x40>
   20:	4c 63 c2             	movsxd r8,edx
   23:	42 0f b6 0c 07       	movzx  ecx,BYTE PTR [rdi+r8*1]
   28:	80 f9 5f             	cmp    cl,0x5f
   2b:	75 e3                	jne    10<foo+0x10>

// Likely end of loop (i.e. branch above is likely taken)

   2d:	41 89 c1             	mov    r9d,eax
   30:	41 83 c9 02          	or     r9d,0x2
   34:	41 80 7c 38 ff 44    	cmp    BYTE PTR [r8+rdi*1-0x1],0x44
   3a:	41 0f 44 c1          	cmove  eax,r9d
   3e:	eb d0                	jmp    10<foo+0x10>
   40:	f3 c3                	repz ret


I'm confused about line 10, |movzx ecl, cl|.  As I understand, this is
truncating ecx so it's 0 everywhere except the least-significant byte.
  But we can only get to line 10 if the last write to ecx was in line
23, which already truncated the top bits of the register.

If I change |str| in the C code to a signed char, then line 10 becomes
movsx, which is sensible.  That the instruction has a parallel in the
signed char case suggests to me that it's probably not an intentional
The pass which typically tracks zero/nonzero information and tries to eliminate redundant zero/sign extensions is combine which only operates on insns found within a single basic block. In this case the instructions at 0x10 and 0x23 are in different basic blocks and combine makes a worst case assumption.

Jeff


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux