Hi, I've noticed that casting to uint8_t often yields slightly more compact code compared to masking with 0xFF, when the result should be equivalent. To give a pointless example, #include <stdint.h> unsigned u; uint8_t u8; void f() { u = (u8 << 3) & 0xFF; } void g() { u = (uint8_t)(u8 << 3); } becomes 00000000 <f()>: 0: 0f b6 05 00 00 00 00 movzbl 0x0,%eax 7: c1 e0 03 shl $0x3,%eax a: 25 ff 00 00 00 and $0xff,%eax f: a3 00 00 00 00 mov %eax,0x0 14: c3 ret 15: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi 19: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi 00000020 <g()>: 20: 0f b6 05 00 00 00 00 movzbl 0x0,%eax 27: c1 e0 03 shl $0x3,%eax 2a: 0f b6 c0 movzbl %al,%eax 2d: a3 00 00 00 00 mov %eax,0x0 32: c3 ret Is there a good reason why GCC picks the first version for the bitmask, or is it just a failure of optimization? When might the cast generate worse code? /Ulf