Re: + bitops-optimize-fns-for-improved-performance.patch added to mm-nonmm-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/04/2024 18.08, Yury Norov wrote:
> + Rasmus
> 
> On Sat, Apr 27, 2024 at 01:33:55PM +0800, Kuan-Wei Chiu wrote:
>> Before:
>>                Start testing find_bit() with random-filled bitmap
>> [    0.299085] fbcon: Taking over console
>> [    0.299820] find_next_bit:                  606286 ns, 164169 iterations
>> [    0.300463] find_next_zero_bit:             641072 ns, 163512 iterations
>> [    0.300996] find_last_bit:                  531027 ns, 164169 iterations
>> [    0.305233] find_nth_bit:                  4235859 ns,  16454 iterations
>> [    0.306434] find_first_bit:                1199357 ns,  16455 iterations
>> [    0.321616] find_first_and_bit:           15179667 ns,  32869 iterations
>> [    0.321917] find_next_and_bit:              298836 ns,  73875 iterations
>> [    0.321918] 
>>                Start testing find_bit() with sparse bitmap
>> [    0.321953] find_next_bit:                    7931 ns,    656 iterations
>> [    0.323201] find_next_zero_bit:            1246980 ns, 327025 iterations
>> [    0.323210] find_last_bit:                    8000 ns,    656 iterations
>> [    0.324427] find_nth_bit:                  1213161 ns,    655 iterations
>> [    0.324813] find_first_bit:                 384747 ns,    656 iterations
>> [    0.324817] find_first_and_bit:               2220 ns,      1 iterations
>> [    0.324820] find_next_and_bit:                1831 ns,      1 iterations
>>
>> After:
>>                Start testing find_bit() with random-filled bitmap
>> [    0.305081] fbcon: Taking over console
>> [    0.306126] find_next_bit:                  854517 ns, 163960 iterations
>> [    0.307041] find_next_zero_bit:             911725 ns, 163721 iterations
>> [    0.307711] find_last_bit:                  668261 ns, 163960 iterations
>> [    0.311160] find_nth_bit:                  3447530 ns,  16372 iterations
>> [    0.312358] find_first_bit:                1196633 ns,  16373 iterations
>> [    0.327191] find_first_and_bit:           14830129 ns,  32951 iterations
>> [    0.327503] find_next_and_bit:              310560 ns,  73719 iterations
>> [    0.327504] 
>>                Start testing find_bit() with sparse bitmap
>> [    0.327539] find_next_bit:                    7633 ns,    656 iterations
>> [    0.328787] find_next_zero_bit:            1247398 ns, 327025 iterations
>> [    0.328797] find_last_bit:                    8425 ns,    656 iterations
>> [    0.330034] find_nth_bit:                  1234044 ns,    655 iterations
>> [    0.330428] find_first_bit:                 392086 ns,    656 iterations
>> [    0.330431] find_first_and_bit:               1980 ns,      1 iterations
>> [    0.330434] find_next_and_bit:                1831 ns,      1 iterations
>>
>> Some benchmarks seem to have worsened after applying this patch.
>> However, unless I'm mistaken, the fns() changes should only affect the
>> results of find_nth_bit, while the others are just random fluctuations.
> 
> So...
> 
> The patch itself looks good, 

Well, I think it could be even better. While I agree that bit=ffs();
clear_bit(bit, ) is probably a bad way of doing it, I think the basic
structure of the function is good. Introducing a "count from 0 up to n"
loop is rarely a good thing, keeping the n counting down to 0 is likely
better.

So I'd instead just change the function to

static inline unsigned long fns(unsigned long word, unsigned int n)
{
	while (word) {
		if (n-- == 0)
			return __ffs(word);
		word &= word - 1;
	}

	return BITS_PER_LONG;
}


Now that I look closer, I think the

 * Returns the bit number of the N'th set bit.
 * If no such, returns @size.

in the find_nth_bit and friends' docs is wrong. Tell me what happens here:


  DECLARE_BITMAP(x, 12);
  int i;

  x[0] = 3;
  i = find_nth_bit(&x, 12, 7);

So I'm asking for the seventh (counting from 0) bit set in a bitmap of
12 bits, but only two bits are set. So i should be 12? No, i will be
BITS_PER_LONG, because fns() doesn't know anything about the limit of
12. Do we really not have any tests covering that? Or indeed covering
any of the small_const_nbits optimizations?

I've said this before, and I repeat. It was a mistake for the bitmap
functions to promise a return of exactly @size when stuff is not found,
they should always have just promised to return something >= @size. The
checking in the callers would be just as easy (and some indeed do >=
instead of ==), but the implementation can be somewhat cheaper. I'm
afraid that ship has sailed, but it annoys me every time I stumble on this.

Rasmus





[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux