On Thu, Jan 26, 2006 at 11:03:54PM +0000, Russell King wrote: > Me too - already solved this problem once. However, I'd rather not > needlessly take a step backwards in the name of generic bitops. Indeed. However, I think we can actually improve bitops for some architectures. Here's what I've found so far: Versions of Alpha, ARM, MIPS, PowerPC and SPARC have bit counting instructions which we're using in most cases. I may have missed some: Alpha may have: ctlz, CounT Leading Zeros cttz, CounT Trailing Zeros ARM (since v5) has: clz, Count Leading Zeros MIPS may have: clz, Count Leading Zeros clo, Count Leading Ones PowerPC has: cntlz[wd], CouNT Leading Zeros (for Word/Double-word) SPARC v9 has: popc, POPulation Count PA-RISC has none. I've not checked any others. The Alpha, ARM and PowerPC functions look fine to me. On MIPS, fls() and flz() should probably use CLO. Curiously, MIPS is the only arch with a flz() function. On SPARC, the implementation of ffz() appears to be "cheese", and the proposed generic versions would be better. ffs() looks quite generic, and fls() uses the linux/bitops.h implementation. There are versions of hweight*() for sparc64 which use POPC when ULTRA_HAS_POPULATION_COUNT is defined, but AFAICS, it's never defined. The SPARC v9 arch manual recommends using popc(x ^ ~-x) for functions like ffs(). ffz() would return ffs(~x). I've had an idea for fls(): static inline int fls(unsigned long x) { x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; return popc(x); } I'm not sure how that compares to the generic fls(), but I suspect it's quite a bit faster. Unfortunately, I don't have any MIPS or SPARC v9 hardware to test this on. I'm not sure if this is of any use: static inline int __ffs(unsigned long x) { return (int)hweight_long(x ^ ~-x) - 1; } The idea being that the generic hweight_long has no branches. -- Stuart Brady