On Thu, May 24, 2012 at 9:45 AM, David Howells <dhowells@xxxxxxxxxx> wrote: > > I didn't suggest it was free, but it might be cheaper. Besides x86/x86_64 has > BSF/BSR instructions - though having played with Dave's algorithm some, I > don't think they're usable for this. David, I don't think you've followed this saga very well. That word-at-a-time algorithm *comes* from x86. Literally. The code is 90% pure copies from arch/x86/lib/usercopy.c. We *know* how the code should be written on little-endian, AND IT IS BOTH BETTER AND FASTER there. Seriously. Little-endian is *superior* for string handling. No questions. In fact, anybody who thinks that big-endian is better for *anything* is seriously deluded these days. Also, BSF is too damn slow. It's slow as hell on most older x86 chips, and it's pointless. You can do clever tricks (again - only on little-endian) that do it portably without it. Grep for count_masked_bytes in the current tree, which does need a fast multiplier on 64-bit, but fast multipliers are way more common than the fast bit scan instructions. I'd love to use a population count instruction, but efficient popc instructions are simply not widely enough available. And the bsf instruction is very slow on old x86 microarchitectures. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html