On Thu, May 24, 2012 at 2:40 AM, David Howells <dhowells@xxxxxxxxxx> wrote: > > Could you use cpu_to_be32/64() and then ffs()? That ought to work for both > variants of endianness. The cpu_to_beXX() should be a noop on BE and is > likely to be a single instruction on LE. The meat of ffs() is usually a > single instruction, though it may have to have zero-detect logic added. First off, the *last* thing you want to do is go to big-endian mode. All the bit counting gets *much* more complicated, and your argument that it's "free" on some architectures is pointless, since it is only free on the architectures that have the *least* users. Secondly, it's not "likely a single instruction" on LE, neither is ffs. It can be, but it's often one of the slower instructions. Many architectures will have - but only in their most recent uarch versions - popcount or similar, and if you're little-endian, that would be what you want. Except we already figured out faster versions for little-endian based on multiplication or a few add/shift operations. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html