Zeev Tarantov <zeev.tarantov@xxxxxxxxx> writes: > When compiling this code: > > unsigned int get_le32(unsigned char *p) > { > return p[0] | p[1] << 8 | p[2] << 16 | p[3] << 24; > } > > On gcc 4.6.0 rev. 172266 for x86-64, I get: > > movzbl 1(%rdi), %eax > movzbl 2(%rdi), %edx > sall $8, %eax > sall $16, %edx > orl %edx, %eax > movzbl (%rdi), %edx > orl %edx, %eax > movzbl 3(%rdi), %edx > sall $24, %edx > orl %edx, %eax > ret > > I hoped for much better code. I hoped to avoid ifdef's depending on > endianess, but this means I can't. > Am I missing something obvious that precludes the compiler from > optimizing the expression? > This is not a regression and other compilers didn't do any better, so > I hope I'm just missing something. The compiler could in principle do better. It basically requires a large pattern match, essentially a new compiler pass looking for this specific type of code, which it currently does not do. For example, there is such a special optimization pass which looks for cases which can use the x86 bswap instruction. This code unsigned int f(unsigned int *p) { unsigned x = *p; return (((x >> 24) & 0xff) | ((x >> 8) & 0xff00) | ((x << 8) & 0xff0000) | ((x << 24) & 0xff000000)); } on x86_64 compiles into movl (%rdi), %eax bswap %eax ret So something similar could be done for your test case. Actually, though, your test case is harder, because p might be misaligned. If it is misaligned, the byte loads might actually be faster. But even when using attribute ((aligned (N))), a new optimization pass would be required to look for this case. Ian