On Thu, Apr 11, 2013 at 11:07 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > Oh, I (wrongly it appears) assumed that fls was something cheap :/ It often is. Particularly on modern machines, because all popcount and leading zero counting ends up being interesting to some people. On older machines, its often a bit-at-a-time thing. We don't even try to support i386 any more, but on atom and P4 it's something like 16 cycles for bsrl, and older cores were worse. So doing three of them when not needed seems a bit excessive.. In contrast, on a Core2, I think it's just a single cycle. Non-x86 architectures end up being the same - some have fast instructions for it, others don't do it at all and end up doing things with bitmasking and shifting. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html