Re: + bitops-optimize-fns-for-improved-performance.patch added to mm-nonmm-unstable branch

Kuan-Wei Chiu <visitorckw@xxxxxxxxx> · Mon, 29 Apr 2024 23:40:34 +0800

On Mon, Apr 29, 2024 at 10:05:12AM +0200, Rasmus Villemoes wrote:
> On 28/04/2024 18.08, Yury Norov wrote:
> > + Rasmus
> > 
> > On Sat, Apr 27, 2024 at 01:33:55PM +0800, Kuan-Wei Chiu wrote:
> >> Before:
> >>                Start testing find_bit() with random-filled bitmap
> >> [    0.299085] fbcon: Taking over console
> >> [    0.299820] find_next_bit:                  606286 ns, 164169 iterations
> >> [    0.300463] find_next_zero_bit:             641072 ns, 163512 iterations
> >> [    0.300996] find_last_bit:                  531027 ns, 164169 iterations
> >> [    0.305233] find_nth_bit:                  4235859 ns,  16454 iterations
> >> [    0.306434] find_first_bit:                1199357 ns,  16455 iterations
> >> [    0.321616] find_first_and_bit:           15179667 ns,  32869 iterations
> >> [    0.321917] find_next_and_bit:              298836 ns,  73875 iterations
> >> [    0.321918] 
> >>                Start testing find_bit() with sparse bitmap
> >> [    0.321953] find_next_bit:                    7931 ns,    656 iterations
> >> [    0.323201] find_next_zero_bit:            1246980 ns, 327025 iterations
> >> [    0.323210] find_last_bit:                    8000 ns,    656 iterations
> >> [    0.324427] find_nth_bit:                  1213161 ns,    655 iterations
> >> [    0.324813] find_first_bit:                 384747 ns,    656 iterations
> >> [    0.324817] find_first_and_bit:               2220 ns,      1 iterations
> >> [    0.324820] find_next_and_bit:                1831 ns,      1 iterations
> >>
> >> After:
> >>                Start testing find_bit() with random-filled bitmap
> >> [    0.305081] fbcon: Taking over console
> >> [    0.306126] find_next_bit:                  854517 ns, 163960 iterations
> >> [    0.307041] find_next_zero_bit:             911725 ns, 163721 iterations
> >> [    0.307711] find_last_bit:                  668261 ns, 163960 iterations
> >> [    0.311160] find_nth_bit:                  3447530 ns,  16372 iterations
> >> [    0.312358] find_first_bit:                1196633 ns,  16373 iterations
> >> [    0.327191] find_first_and_bit:           14830129 ns,  32951 iterations
> >> [    0.327503] find_next_and_bit:              310560 ns,  73719 iterations
> >> [    0.327504] 
> >>                Start testing find_bit() with sparse bitmap
> >> [    0.327539] find_next_bit:                    7633 ns,    656 iterations
> >> [    0.328787] find_next_zero_bit:            1247398 ns, 327025 iterations
> >> [    0.328797] find_last_bit:                    8425 ns,    656 iterations
> >> [    0.330034] find_nth_bit:                  1234044 ns,    655 iterations
> >> [    0.330428] find_first_bit:                 392086 ns,    656 iterations
> >> [    0.330431] find_first_and_bit:               1980 ns,      1 iterations
> >> [    0.330434] find_next_and_bit:                1831 ns,      1 iterations
> >>
> >> Some benchmarks seem to have worsened after applying this patch.
> >> However, unless I'm mistaken, the fns() changes should only affect the
> >> results of find_nth_bit, while the others are just random fluctuations.
> > 
> > So...
> > 
> > The patch itself looks good, 
> 
> Well, I think it could be even better. While I agree that bit=ffs();
> clear_bit(bit, ) is probably a bad way of doing it, I think the basic
> structure of the function is good. Introducing a "count from 0 up to n"
> loop is rarely a good thing, keeping the n counting down to 0 is likely
> better.
> 
> So I'd instead just change the function to
> 
> static inline unsigned long fns(unsigned long word, unsigned int n)
> {
> 	while (word) {
> 		if (n-- == 0)
> 			return __ffs(word);
> 		word &= word - 1;
> 	}
> 
> 	return BITS_PER_LONG;
> }
>
How about rewriting it as follows:

static inline unsigned long fns(unsigned long word, unsigned int n)
{
    while (word && n--)
        word &= word - 1;

    return word ? __ffs(word) : BITS_PER_LONG;
}

IMHO, this way the code can be shorter and more elegant.

Regards,
Kuan-Wei

> 
> Now that I look closer, I think the
> 
>  * Returns the bit number of the N'th set bit.
>  * If no such, returns @size.
> 
> in the find_nth_bit and friends' docs is wrong. Tell me what happens here:
> 
> 
>   DECLARE_BITMAP(x, 12);
>   int i;
> 
>   x[0] = 3;
>   i = find_nth_bit(&x, 12, 7);
> 
> So I'm asking for the seventh (counting from 0) bit set in a bitmap of
> 12 bits, but only two bits are set. So i should be 12? No, i will be
> BITS_PER_LONG, because fns() doesn't know anything about the limit of
> 12. Do we really not have any tests covering that? Or indeed covering
> any of the small_const_nbits optimizations?
> 
> I've said this before, and I repeat. It was a mistake for the bitmap
> functions to promise a return of exactly @size when stuff is not found,
> they should always have just promised to return something >= @size. The
> checking in the callers would be just as easy (and some indeed do >=
> instead of ==), but the implementation can be somewhat cheaper. I'm
> afraid that ship has sailed, but it annoys me every time I stumble on this.
> 
> Rasmus
>