RE: [RFC 00/10] implement alternative and much simpler id allocator

Matthew Wilcox <mawilcox@xxxxxxxxxxxxx> · Fri, 23 Dec 2016 17:03:51 +0000

From: Rasmus Villemoes [mailto:linux@xxxxxxxxxxxxxxxxxx]
> Nice work! A few random comments/questions:
> 
> - It does add some complexity, but I think a few comments would make it
>   more digestable.

I'm open to adding some comments ... I need some time between writing the code and writing the comments to be sure what comments are useful.

> - Hm, maybe I'm confused, and I certainly don't understand how the whole
>   radix tree works. But do you use every leaf node as an exceptional
>   entry initially, to allocate the first 62 ids from that level? This
>   code

I do!  And that question tells me one useful comment to add!

> 	if ((bit + RADIX_TREE_EXCEPTIONAL_SHIFT) <
> 					BITS_PER_LONG) {
> 		bit += RADIX_TREE_EXCEPTIONAL_SHIFT;
> 		radix_tree_iter_replace(root, &iter, slot,
> 				(void *)((1UL << bit) |
> 				RADIX_TREE_EXCEPTIONAL_ENTRY));
> 		*id = new;
> 		return 0;
> 	}
> 
>    operates on bit, which is the offset from index*IDA_BITMAP_BITS, and
>    it seems to create an exceptional entry somewhere down the tree
>    (which may of course be the root).
> 
>    But we don't seem to allocate another bit from that exceptional entry
>    ever unless it happened to sit at index 0; the code
> 
> 	unsigned long tmp = (unsigned long)bitmap;
> 	if (start < BITS_PER_LONG) {
> 		unsigned tbit = find_next_zero_bit(&tmp,
> 					BITS_PER_LONG, start);
> 		if (tbit < BITS_PER_LONG) {
> 			tmp |= 1UL << tbit;
> 			rcu_assign_pointer(*slot, (void *)tmp);
> 			*id = new + tbit -
> 				RADIX_TREE_EXCEPTIONAL_SHIFT;
> 			return 0;
> 		}
> 	}
> 
>    is only used for small values of start (though it does seem to
>    account for a non-zero value of new == iter.index * IDA_BITMAP_BITS).

Ahh.  You're reading the code wrong, and I wrote the code wrongly too, so it's clearly overly complex.  I was testing with 'start' set to 0, allocating N entries, and then freeing them.  If I'd set start to 1024 and allocated two entries, I'd've seen the failure.

Please see the top commit here ("Improve IDA exceptional entry handling"): http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/idr-2016-12-20

> - In the micro-optimization department, I think we should avoid
>   find_next_zero_bit on a single word, and instead use __ffs. Something
>   like (assuming we can use bit instead of start)
> 
>   if (bit + RADIX_TREE_EXCEPTIONAL_SHIFT < BITS_PER_LONG) {
>     tmp = (~(unsigned long)bitmap) >> (bit + RADIX_TREE_EXCEPTIONAL_SHIFT);
>     if (tmp) {
>       tbit = __ffs(tmp) + bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
>       tmp = (unsigned long)bitmap | (1UL << tbit);
>       rcu_assign_pointer(*slot, (void *)tmp);
>       *id = new + tbit - RADIX_TREE_EXCEPTIONAL_SHIFT;
>       return 0;
>     }
>   }

I'm certainly open to microoptimisations, but I'll have to think about this one for a bit.
��.n��������+%������w��{.n�����{����n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�