Function "int __builtin_popcount (unsigned int x)" documented at: http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005fpopcount-3160 on x86-64 with -O2, both using version 4.5.2 and a recent 4.6 branch, I get this code emitted: 0000000000402230 <__popcountdi2>: 402230: 48 8b 15 a9 0d 20 00 mov 0x200da9(%rip),%rdx # 602fe0 <_DYNAMIC+0x1a8> 402237: 31 c0 xor %eax,%eax 402239: 31 c9 xor %ecx,%ecx 40223b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 402240: 48 89 fe mov %rdi,%rsi 402243: 48 d3 ee shr %cl,%rsi 402246: 83 c1 08 add $0x8,%ecx 402249: 81 e6 ff 00 00 00 and $0xff,%esi 40224f: 0f b6 34 32 movzbl (%rdx,%rsi,1),%esi 402253: 01 f0 add %esi,%eax 402255: 83 f9 40 cmp $0x40,%ecx 402258: 75 e6 jne 402240 <__popcountdi2+0x10> 40225a: f3 c3 repz retq This computes the population count using an 8-bit look up table by iterating over the 8 bytes of the input and summing the looked-up values. This is the right code for "int popcount(unsigned long x)", not for "int popcount (unsigned int x)". It performs twice the amount of work needed. The only way a 64bit function would be ok to use when a 32bit is required is if it would short-circuit when input reaches zero, like this: .cfi_startproc xorl %eax, %eax .p2align 4,,10 .p2align 3 .L2: movzbl %dil, %edx shrq $8, %rdi movzbl bitsums(%rdx), %edx addl %edx, %eax testq %rdi, %rdi jne .L2 rep ret .cfi_endproc My question is whether this behavior is intentional, and whether gcc could expose a built-in 64 bit popcount function instead of a 32-bit one. -Z.T.