an inconsistency in generated code?

Pavel Tolkachev <paultolk@xxxxxxxxx> · Mon, 15 Aug 2011 18:06:55 -0700 (PDT)

I needed to write portable and 
fast C/C++ function to bit-negate the most significant byte of a 32-bit 
integer. Currently, my target is i386. First, I wrote this function:

uint32_t
BitNegMsb_1(uint32_t v)
{
  return (v & 0xFFFFFFu) | (~v &0xFF000000u);
}

And compiled it with gcc (I compiled all the code referred to in this e-mail with -O3). The compilation result was:

08048510 <_Z11BitNegMsb_1j>:
 8048510:    55                       push   %ebp
 8048511:    89 e5               
     mov    %esp,%ebp
 8048513:    8b 55 08                 mov    0x8(%ebp),%edx
 8048516:    5d                       pop    %ebp
 8048517:    89 d0                    mov    %edx,%eax
 8048519:    81 e2 ff ff ff 00        and    $0xffffff,%edx
 804851f:    f7 d0                    not    %eax
 8048521:    25 00 00 00
 ff           and    $0xff000000,%eax
 8048526:    09 d0                    or     %edx,%eax
 8048528:    c3                       ret
Which was literally what I wrote, but in assembler. 

Then,
 I decided to write architecture-specific code, relying on the fact that
 i386 is MSB-first, in a hope that maybe gcc will generate different 
code for that as I felt could give it more freedom to optimize (just a
 gut feeling):

uint32_t
BitNegMsb_2(uint32_t v)
{ // 
  union {
    uint32_t u;
    unsigned char uc[4];
  } uv;
  uv.u = v;
  uv.uc[3] =
 ~uv.uc[3];
  return uv.u;
}

The generated code turned out to be different (although of the same length) and based on masks-and-shifts, not just masks:

08048530 <_Z11BitNegMsb_2j>:
 8048530:    55                       push   %ebp
 8048531:    89 e5                    mov    %esp,%ebp
 8048533:    8b 45 08                 mov    0x8(%ebp),%eax
 8048536:    5d                       pop   
 %ebp
 8048537:    89 c2                    mov    %eax,%edx
 8048539:    25 ff ff ff 00           and    $0xffffff,%eax
 804853e:    c1 ea 18                 shr    $0x18,%edx
 8048541:    f7 d2                    not    %edx
 8048543:    c1 e2 18                 shl    $0x18,%edx
 8048546:    09
 d0                    or     %edx,%eax
 8048548:    c3                       ret    
 8048549:    8d b4 26 00 00 00 00     lea    0x0(%esi,%eiz,1),%esi

I
 said myself: "aha, why don't I get myself the best of two worlds -- the
 code that is both portable and optimal from g++ -O3 perspective", so I 
wrote the following function:

uint32_t
BitNegMsb_3(uint32_t v)
{
  return (v & 0x00FFFFFF) | (~(v >> 24) << 24);
}

which
 IMHO is exactly what the generated code for BitNegMsb_2 does, but in 
portable C/C++. To my surprise, the code generated for BitNegMsb_3 was 
exactly like the code generated for BitNegMsb_1
 (not BitNegMsb_2!). My questions are:

1. Are functions 
BitNegMsb_3,1, from one side, and BitNegMsb_2, from the other side not 
semantically equivalent from gcc code-generation perspective?
2. If they are not semantically equivalent, what is the difference?
3.
 If they are, why does gcc generate different code for them and which is
 best-performing "from gcc perspective". gcc seems to be 
non-deterministic in this case so I am curious.

Thanks in advance,
-Pavel