On Mon, Dec 04, 2023 at 10:22:58AM +0800, John Sanpe wrote: > Replace the internal table lookup algorithm with the hweight > library, which has instruction set acceleration. This is undeniably better, but why stop here? Instead of working one byte at a time, you could work an entire word at a time and use hweight_long(). Also, if you're in the mood for a second patch, free_bit[] is clearly an open-coding of ffz().