On Fri, Mar 2, 2012 at 3:46 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Here's a new version of that patch. > > It's based n the cleanups I committed and pushed out today, so the base > may not look familiar, but the upside is that this does the configuration > automatically (currently the patch enables the word accesses on x86 by > default as long as DEBUG_PAGEALLOC isn't set). > > I worked around my problems with stupid branch prediction on the '/' test > at the end by just reorganizing the code a bit, and it actually all just > looks cleaner. > > This *does* assume that "bsf" is a reasonably fast instruction, which is > not necessarily the case especially on 32-bit x86. So the config option > choice for this might want some tuning even on x86, but it would be lovely > to get comments and have people test it out on older hardware. There is an improvement you can make to this. "bsf" is microcoded in many future CPUs (e.g. Piledriver) in favor of tzcnt, which has slightly different flag behavior and no undefined behavior and is part of BMI1. This costs a few clocks in such chips -- not as bad as the Really Slow bsf in chips like the Pentium 4 and Atom, but more than is necessary (and stalls the instruction decoder, at least on AMD, while the microcode unit runs). However, "tzcnt" is opcode-equivalent to "rep bsf", for some odd sort of backwards compatibility. Therefore, if your code does the same thing with both tzcnt and bsf, you can simply use rep bsf instead, and it'll work without any CPU checks. The manuals claim that this works on legacy CPUs, i.e. won't SIGILL. The only downside I've noticed to this is that valgrind incorrectly SIGILLs on rep bsf. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html