At 04:37 PM 1/20/2006, Martijn van Oosterhout wrote:
On Fri, Jan 20, 2006 at 04:19:15PM -0500, Tom Lane wrote:
> % cumulative self self total
> time seconds seconds calls Ks/call Ks/call name
> 98.96 1495.93 1495.93 33035195 0.00 0.00 hemdistsign
<snip>
> So we gotta fix hemdistsign ...
lol! Yeah, I guess so. Pretty nasty loop. LOOPBIT will iterate 8*63=504
times and it's going to do silly bit handling on each and every
iteration.
Given that all it's doing is counting bits, a simple fix would be to
loop over bytes, use XOR and count ones. For extreme speedup create a
lookup table with 256 entries to give you the answer straight away...
For an even more extreme speedup, don't most modern CPUs have an asm
instruction that counts the bits (un)set (AKA "population counting")
in various size entities (4b, 8b, 16b, 32b, 64b, and 128b for 64b
CPUs with SWAR instructions)?
Ron