Re: [PATCH 3/4] maple_tree: use the correct min to calculate split

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 07, 2024 at 10:19:37PM -0500, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@xxxxxxxxx> [241107 21:55]:
>> On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote:
>> >* Wei Yang <richard.weiyang@xxxxxxxxx> [241019 22:46]:
>> >> The check in mab_calc_split() is to make sure the gap between
>> >> [0, split] won't be too small. But we don't pass the correct min.
>> >> 
>> >> First we may encounter a pivot[split] smaller than min. For example:
>> >> 
>> >>        mt_init_flags(mt, 0);
>> >>        for (count = 0; count <= 240; count++) {
>> >>                mas_set(&mas, count);
>> >>                mas_store(&mas, xa_mk_value(count));
>> >>        }
>> >> 
>> >> On splitting root for storing 240, the pivot[split] is smaller than min.
>> >> This result a huge (pivot[split] - min).
>> >
>> >This is fine.
>> >
>> >There is an open work item to make it more accurate at higher levels,
>> >but it's not a problem as it is.
>> >
>> >Each level upwards needs a better 'minimum span', meaning that the node
>> >should have at least mas.min - mas.min + level * something.
>> 
>> Don't follow this. 
>> 
>> First, I guess it is mas.max - mas.min?
>> 
>
>No, the idea is that we could do better if we could keep the data more
>densely packed.
>
>If we have singletons (all range of 1), and we have 10,000 singletons,
>then a parent node with 0 - 128 and a sibling of 129-257 makes no sense,
>there is nothing that can be inserting into the parents last two slots
>- we can't split singletons smaller so we should have done a better job
>at splitting.
>
>If you think of how this grows today, we split then rebalance until the
>left node is full and we have to split again.  This could be better
>optimised to avoid multiple rebalances by doing something to push more
>data to the left based on the span of the children and the nodes.  But
>we'd want to keep enough so some modifications to each node doesn't
>cause a rebalance in the opposite direction.
>
>So a minimum span of parent nodes one level up from the leaves would be
>16 * 10 = 160, so when we split we can push more to the left to try and
>get to 160 while keeping the right node sufficient.  The math of the
>span would change based on how far up the tree we go, but would be
>easier to maintain.
>
>I'd probably still want to split it 50/50 the first time, but it might
>make sense to push more on the next rebalance.  so 10 => 6|5, 6|10 =>
>9|5, but maybe it's not worth the effort.  I'm starting to think so,
>considering it's been ineffective to date.  There is a chance that the
>active area gets moved to the more-full node and ends up splitting
>sooner anyways..
>
>All of this will probably change when dense nodes come anyways.
>
>> Then there is sth related to level. But I don't see level is used for
>> calculation. Would you mind sharing more your original thought?
>> 
>> >
>> >It works today for leaves, somewhat.
>> >
>> 
>> To me it works by coincidence.
>
>Seems like it.
>
>> 
>> The condition check is:
>> 
>>   (bn->pivot[split] - min) < (slot_count - 1) 
>> 
>> while slot_count is a constent, 16 for leaf node.
>> 
>> And we always pass 0 as min, the condition for leaf is:
>> 
>>   bn->pivot[split] < 15
>> 
>> For the mmap world, per my understanding, we won't expect an address is smaller
>> than 15.
>
>There are other users, but I think we should just drop this attempt at
>fancy stuff.
>

You mean "set mid_split = 0 and drop the min argument all together" mentioned
in another mail?

>> 
>> >> 
>> >> Second prev_l_mas.min is not initialized for the first iteration. This
>> >> means on splitting leaf node, this value is mostly taking no effect.
>> >
>> >No, it is set to 0.  Not initialized would cause random data loss.
>> >
>> >See MA_STATE() in the header.
>> >
>> 
>> Right, I want to say not initialized to the value we want.
>> 
>> Will be more careful next time.
>> 
>> >> 
>> >> Since we are now calculating the split of mas->node, we should use the
>> >> mas->min instead of prev_l_mas.min.
>> >
>> >This sounds reasonable, but considering what this number is used for, I
>> >don't see how it is working as well as it is today.  I will need to look
>> >deeper into this.
>> >
>> >> 
>> >> Signed-off-by: Wei Yang <richard.weiyang@xxxxxxxxx>
>> >> CC: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
>> >> CC: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
>> >> CC: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
>> >> ---
>> >>  lib/maple_tree.c | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> 
>> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> >> index 894dc5e9436e..c2d4b188646c 100644
>> >> --- a/lib/maple_tree.c
>> >> +++ b/lib/maple_tree.c
>> >> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
>> >>  		if (mas_push_data(mas, height, &mast, false))
>> >>  			break;
>> >>  
>> >> -		split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
>> >> +		split = mab_calc_split(mas, b_node, &mid_split, mas->min);
>> >>  		mast_split_data(&mast, mas, split);
>> >>  		/*
>> >>  		 * Usually correct, mab_mas_cp in the above call overwrites
>> >> -- 
>> >> 2.34.1
>> >> 
>> 
>> -- 
>> Wei Yang
>> Help you, Help me

-- 
Wei Yang
Help you, Help me




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux