On Thu, Nov 07, 2024 at 10:19:37PM -0500, Liam R. Howlett wrote: >* Wei Yang <richard.weiyang@xxxxxxxxx> [241107 21:55]: >> On Sun, Oct 20, 2024 at 06:00:35PM -0400, Liam R. Howlett wrote: >> >* Wei Yang <richard.weiyang@xxxxxxxxx> [241019 22:46]: >> >> The check in mab_calc_split() is to make sure the gap between >> >> [0, split] won't be too small. But we don't pass the correct min. >> >> >> >> First we may encounter a pivot[split] smaller than min. For example: >> >> >> >> mt_init_flags(mt, 0); >> >> for (count = 0; count <= 240; count++) { >> >> mas_set(&mas, count); >> >> mas_store(&mas, xa_mk_value(count)); >> >> } >> >> >> >> On splitting root for storing 240, the pivot[split] is smaller than min. >> >> This result a huge (pivot[split] - min). >> > >> >This is fine. >> > >> >There is an open work item to make it more accurate at higher levels, >> >but it's not a problem as it is. >> > >> >Each level upwards needs a better 'minimum span', meaning that the node >> >should have at least mas.min - mas.min + level * something. >> >> Don't follow this. >> >> First, I guess it is mas.max - mas.min? >> > >No, the idea is that we could do better if we could keep the data more >densely packed. > >If we have singletons (all range of 1), and we have 10,000 singletons, >then a parent node with 0 - 128 and a sibling of 129-257 makes no sense, >there is nothing that can be inserting into the parents last two slots >- we can't split singletons smaller so we should have done a better job >at splitting. > >If you think of how this grows today, we split then rebalance until the >left node is full and we have to split again. This could be better >optimised to avoid multiple rebalances by doing something to push more >data to the left based on the span of the children and the nodes. But >we'd want to keep enough so some modifications to each node doesn't >cause a rebalance in the opposite direction. > >So a minimum span of parent nodes one level up from the leaves would be >16 * 10 = 160, so when we split we can push more to the left to try and >get to 160 while keeping the right node sufficient. The math of the >span would change based on how far up the tree we go, but would be >easier to maintain. > >I'd probably still want to split it 50/50 the first time, but it might >make sense to push more on the next rebalance. so 10 => 6|5, 6|10 => >9|5, but maybe it's not worth the effort. I'm starting to think so, >considering it's been ineffective to date. There is a chance that the >active area gets moved to the more-full node and ends up splitting >sooner anyways.. > >All of this will probably change when dense nodes come anyways. > >> Then there is sth related to level. But I don't see level is used for >> calculation. Would you mind sharing more your original thought? >> >> > >> >It works today for leaves, somewhat. >> > >> >> To me it works by coincidence. > >Seems like it. > >> >> The condition check is: >> >> (bn->pivot[split] - min) < (slot_count - 1) >> >> while slot_count is a constent, 16 for leaf node. >> >> And we always pass 0 as min, the condition for leaf is: >> >> bn->pivot[split] < 15 >> >> For the mmap world, per my understanding, we won't expect an address is smaller >> than 15. > >There are other users, but I think we should just drop this attempt at >fancy stuff. > You mean "set mid_split = 0 and drop the min argument all together" mentioned in another mail? >> >> >> >> >> Second prev_l_mas.min is not initialized for the first iteration. This >> >> means on splitting leaf node, this value is mostly taking no effect. >> > >> >No, it is set to 0. Not initialized would cause random data loss. >> > >> >See MA_STATE() in the header. >> > >> >> Right, I want to say not initialized to the value we want. >> >> Will be more careful next time. >> >> >> >> >> Since we are now calculating the split of mas->node, we should use the >> >> mas->min instead of prev_l_mas.min. >> > >> >This sounds reasonable, but considering what this number is used for, I >> >don't see how it is working as well as it is today. I will need to look >> >deeper into this. >> > >> >> >> >> Signed-off-by: Wei Yang <richard.weiyang@xxxxxxxxx> >> >> CC: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> >> >> CC: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx> >> >> CC: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> >> >> --- >> >> lib/maple_tree.c | 2 +- >> >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> >> index 894dc5e9436e..c2d4b188646c 100644 >> >> --- a/lib/maple_tree.c >> >> +++ b/lib/maple_tree.c >> >> @@ -3357,7 +3357,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node) >> >> if (mas_push_data(mas, height, &mast, false)) >> >> break; >> >> >> >> - split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min); >> >> + split = mab_calc_split(mas, b_node, &mid_split, mas->min); >> >> mast_split_data(&mast, mas, split); >> >> /* >> >> * Usually correct, mab_mas_cp in the above call overwrites >> >> -- >> >> 2.34.1 >> >> >> >> -- >> Wei Yang >> Help you, Help me -- Wei Yang Help you, Help me