On Fri, Mar 26, 2010 at 10:23:46AM -0700, Linus Torvalds wrote: > > > On Fri, 26 Mar 2010, David Howells wrote: > > > > fls(N), ffs(N) and fls64(N) can be optimised on x86/x86_64. Currently they > > perform checks against N being 0 before invoking the BSR/BSF instruction, or > > use a CMOV instruction afterwards. Either the check involves a conditional > > jump which we'd like to avoid, or a CMOV, which we'd also quite like to avoid. > > > > Instead, we can make use of the fact that BSR/BSF doesn't modify its output > > register if its input is 0. By preloading the output with -1 and incrementing > > the result, we achieve the desired result without the need for a conditional > > check. > > This is totally incorrect. > > Where did you find that "doesn't modify its output" thing? It's not true. > The truth is that the destination is undefined. Just read the dang Intel > documentation, it's very clearly stated right there. While this is true for the current (253666-031US) Intel documentation, the AMD documentation (rev 3.14) for the same instruction states that the destination register is unchanged (as opposed to Intel's undefined). I wonder if Intel's EM64 stuff makes this more deterministic, perhaps David's implementation would work for x86_64 only? scott -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html