Hi, Sorry for reviving this conversation, but it looks to me like this function could be reduced to a single bitmap_weight call: static inline size_t memweight(const void *ptr, size_t bytes) { BUG_ON(bytes >= UINT_MAX / BITS_PER_BYTE); return bitmap_weight(ptr, bytes * BITS_PER_BYTE); } Comparing to the current implementation https://elixir.bootlin.com/linux/latest/source/lib/memweight.c#L11 this results in a signification simplification. __bitmap_weight already count last bits with hweight_long as we discussed earlier. int __bitmap_weight(const unsigned long *bitmap, unsigned int bits) { ... if (bits % BITS_PER_LONG) w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits)); ... } and __arch_hweight* functions use popcnt instruction. I've briefly tested the equivalence of 2 implementations on x86_64 with fuzzing here: https://gist.github.com/evdenis/95a8b9b8041e09368b31c3a9510491a5 What do you think making this function static inline and moving it to include/linux/string.h? I could prepare a patch for it and add some tests for memweight and bitmap_weight. Or maybe I miss something again? Best regards, Denis