Zeev Tarantov <zeev.tarantov@xxxxxxxxx> writes: > This computes the population count using an 8-bit look up table by > iterating over the 8 bytes of the input and summing the looked-up > values. > This is the right code for "int popcount(unsigned long x)", not for > "int popcount (unsigned int x)". > It performs twice the amount of work needed. First I should say that for x86_64, if you know that you are using processors with SSE4.2 or ABM support, you can use -mpopcnt, or an appropriate -march= option, to direct gcc to use the hardware popcnt instruction. Other than that, this is in effect a minor optimization bug. The underlying reason is that for simplicity in dealing with the library support functions, gcc always promotes to the register size before calling them. This zero-extension costs nothing on x86_64, and for most library functions it makes little performance difference whether they operate on a 32-bit or 64-bit value. The __builtin_popcount function is an exception. Please consider filing a bug report; see http://gcc.gnu.org/bugs/ . Ian