I needed to write portable and fast C/C++ function to bit-negate the most significant byte of a 32-bit integer. Currently, my target is i386. First, I wrote this function: uint32_t BitNegMsb_1(uint32_t v) { return (v & 0xFFFFFFu) | (~v &0xFF000000u); } And compiled it with gcc (I compiled all the code referred to in this e-mail with -O3). The compilation result was: 08048510 <_Z11BitNegMsb_1j>: 8048510: 55 push %ebp 8048511: 89 e5 mov %esp,%ebp 8048513: 8b 55 08 mov 0x8(%ebp),%edx 8048516: 5d pop %ebp 8048517: 89 d0 mov %edx,%eax 8048519: 81 e2 ff ff ff 00 and $0xffffff,%edx 804851f: f7 d0 not %eax 8048521: 25 00 00 00 ff and $0xff000000,%eax 8048526: 09 d0 or %edx,%eax 8048528: c3 ret Which was literally what I wrote, but in assembler. Then, I decided to write architecture-specific code, relying on the fact that i386 is MSB-first, in a hope that maybe gcc will generate different code for that as I felt could give it more freedom to optimize (just a gut feeling): uint32_t BitNegMsb_2(uint32_t v) { // union { uint32_t u; unsigned char uc[4]; } uv; uv.u = v; uv.uc[3] = ~uv.uc[3]; return uv.u; } The generated code turned out to be different (although of the same length) and based on masks-and-shifts, not just masks: 08048530 <_Z11BitNegMsb_2j>: 8048530: 55 push %ebp 8048531: 89 e5 mov %esp,%ebp 8048533: 8b 45 08 mov 0x8(%ebp),%eax 8048536: 5d pop %ebp 8048537: 89 c2 mov %eax,%edx 8048539: 25 ff ff ff 00 and $0xffffff,%eax 804853e: c1 ea 18 shr $0x18,%edx 8048541: f7 d2 not %edx 8048543: c1 e2 18 shl $0x18,%edx 8048546: 09 d0 or %edx,%eax 8048548: c3 ret 8048549: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi I said myself: "aha, why don't I get myself the best of two worlds -- the code that is both portable and optimal from g++ -O3 perspective", so I wrote the following function: uint32_t BitNegMsb_3(uint32_t v) { return (v & 0x00FFFFFF) | (~(v >> 24) << 24); } which IMHO is exactly what the generated code for BitNegMsb_2 does, but in portable C/C++. To my surprise, the code generated for BitNegMsb_3 was exactly like the code generated for BitNegMsb_1 (not BitNegMsb_2!). My questions are: 1. Are functions BitNegMsb_3,1, from one side, and BitNegMsb_2, from the other side not semantically equivalent from gcc code-generation perspective? 2. If they are not semantically equivalent, what is the difference? 3. If they are, why does gcc generate different code for them and which is best-performing "from gcc perspective". gcc seems to be non-deterministic in this case so I am curious. Thanks in advance, -Pavel