Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

Stuart Brady <sdbrady@xxxxxxxxxxxx> · Sun, 29 Jan 2006 07:12:42 +0000

On Thu, Jan 26, 2006 at 11:03:54PM +0000, Russell King wrote:
> Me too - already solved this problem once.  However, I'd rather not
> needlessly take a step backwards in the name of generic bitops.

Indeed.  However, I think we can actually improve bitops for some
architectures.  Here's what I've found so far:

Versions of Alpha, ARM, MIPS, PowerPC and SPARC have bit counting
instructions which we're using in most cases.  I may have missed some:

  Alpha may have:
    ctlz, CounT Leading Zeros
    cttz, CounT Trailing Zeros

  ARM (since v5) has:
    clz, Count Leading Zeros

  MIPS may have:
    clz, Count Leading Zeros
    clo, Count Leading Ones

  PowerPC has:
    cntlz[wd], CouNT Leading Zeros (for Word/Double-word)

  SPARC v9 has:
    popc, POPulation Count

PA-RISC has none.  I've not checked any others.

The Alpha, ARM and PowerPC functions look fine to me.

On MIPS, fls() and flz() should probably use CLO.  Curiously, MIPS is
the only arch with a flz() function.

On SPARC, the implementation of ffz() appears to be "cheese", and the
proposed generic versions would be better.  ffs() looks quite generic,
and fls() uses the linux/bitops.h implementation.

There are versions of hweight*() for sparc64 which use POPC when
ULTRA_HAS_POPULATION_COUNT is defined, but AFAICS, it's never defined.

The SPARC v9 arch manual recommends using popc(x ^ ~-x) for functions
like ffs().  ffz() would return ffs(~x).

I've had an idea for fls():

static inline int fls(unsigned long x)
{
	x |= x >> 1;
	x |= x >> 2;
	x |= x >> 4;
	x |= x >> 8;
	x |= x >> 16;
	return popc(x);
}

I'm not sure how that compares to the generic fls(), but I suspect it's
quite a bit faster.  Unfortunately, I don't have any MIPS or SPARC v9
hardware to test this on.

I'm not sure if this is of any use:

static inline int __ffs(unsigned long x)
{
	return (int)hweight_long(x ^ ~-x) - 1;
}

The idea being that the generic hweight_long has no branches.
-- 
Stuart Brady