On Sat, Aug 24, 2019 at 01:01:02PM +0300, Denis Efremov wrote: > This patch open codes the bitmap_weight() call. The direct > invocation of hweight_long() allows to remove the BUG_ON and > excessive "longs to bits, bits to longs" conversion. Honestly, that's not the problem with this function. Take a look at https://danluu.com/assembly-intrinsics/ for a _benchmarked_ set of problems with popcnt. > BUG_ON was required to check that bitmap_weight() will return > a correct value, i.e. the computed weight will fit the int type > of the return value. What? No. Look at the _arguments_ of bitmap_weight(): static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits) > With this patch memweight() controls the > computation directly with size_t type everywhere. Thus, the BUG_ON > becomes unnecessary. Why are you bothering? How are you allocating half a gigabyte of memory? Why are you calling memweight() on half a gigabyte of memory? > if (longs) { > - BUG_ON(longs >= INT_MAX / BITS_PER_LONG); > - ret += bitmap_weight((unsigned long *)bitmap, > - longs * BITS_PER_LONG); > + const unsigned long *bitmap_long = > + (const unsigned long *)bitmap; > + > bytes -= longs * sizeof(long); > - bitmap += longs * sizeof(long); > + for (; longs > 0; longs--, bitmap_long++) > + ret += hweight_long(*bitmap_long); > + bitmap = (const unsigned char *)bitmap_long; > } If you really must change anything, I'd rather see this turned into a loop: while (longs) { unsigned int nbits; if (longs >= INT_MAX / BITS_PER_LONG) nbits = INT_MAX + 1; else nbits = longs * BITS_PER_LONG; ret += bitmap_weight((unsigned long *)bitmap, sz); bytes -= nbits / 8; bitmap += nbits / 8; longs -= nbits / BITS_PER_LONG; } then we only have to use Dan Luu's optimisation in bitmap_weight() and not in memweight() as well. Also, why does the trailer do this: for (; bytes > 0; bytes--, bitmap++) ret += hweight8(*bitmap); instead of calling hweight_long on *bitmap & mask?