Re: __builtin_popcount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Zeev Tarantov <zeev.tarantov@xxxxxxxxx> writes:

> This computes the population count using an 8-bit look up table by
> iterating over the 8 bytes of the input and summing the looked-up
> values.
> This is the right code for "int popcount(unsigned long x)", not for
> "int popcount (unsigned int x)".
> It performs twice the amount of work needed.

First I should say that for x86_64, if you know that you are using
processors with SSE4.2 or ABM support, you can use -mpopcnt, or an
appropriate -march= option, to direct gcc to use the hardware popcnt
instruction.

Other than that, this is in effect a minor optimization bug.  The
underlying reason is that for simplicity in dealing with the library
support functions, gcc always promotes to the register size before
calling them.  This zero-extension costs nothing on x86_64, and for most
library functions it makes little performance difference whether they
operate on a 32-bit or 64-bit value.  The __builtin_popcount function is
an exception.

Please consider filing a bug report; see http://gcc.gnu.org/bugs/ .

Ian


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux