On 2/25/25 11:02 AM, Liam R. Howlett wrote:
* Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx> [250221 11:36]:
If a parent node is vacant but holds mt_min_slots + 1 entries,
rebalancing with a leaf node could cause this parent node to become
insufficient. This will lead to another level of rebalancing in the tree
and requires more node allocations. Therefore, we also have to track the
level at which there is a node with > mt_min_slots entries. We can use
this as the worst case for the spanning and rebalacning stores.
This may not explain the situation fully; We also have to track the last
level at which there is a node that will not become insufficient. We
know that during rebalance, the number of entries in a non-leaf node may
decrease by one. Tracking the last node that will remain sufficient and
stop the cascading operation can be used to reduce the number of nodes
preallocated for the operation.
Note that this can happen at any level of an operation and not just a
node containing leaves.
The spanning store operation can also be treated the same because the
walk down the tree stops when it is detected. That means the location
of the walk that detects the spanning store may be reduced to be
insufficient and will be rebalanced or may be split and need to absorb
up to two entries.
I think this commit needs some more text explaining these changes.
Does this commit message work better?
Using vacant height to reduce the worst case maple node allocation count
can lead to a shortcoming of nodes in the following scenarios.
For rebalancing writes, when a leaf node becomes insufficient, we push
the now insufficient number of entries into a sibling node. This means
that the parent node which has entries for this children will lose one
entry. If this parent node was only sufficient because it had the
minimum number of entries to be sufficient, losing one entry will now
cause this parent node to be insufficient. This leads to a cascading
operation of rebalancing at different levels and can lead to more node
allocations that simply using vacant height can return.
For spanning writes, a similar situation occurs. At the location at
which a spanning write is detected, the number of ancestor nodes may
similarly need to rebalanced into a smaller number of nodes and the same
cascading situation could occur.
To use less than the full height of the tree for the number of
allocations, we also need to track the height at which a non-leaf node
cannot become insufficient. This means even if a rebalance occurs to a
child of this node, it currently has enough entries that losing one
entry will not cause this node to be insufficient. This field is stored
in the maple write state as sufficient height. In mas_prealloc_calc()
when figuring out how many nodes to allocate, we check if the the vacant
node is lower in the tree than a sufficient node (has a larger value).
If it is, we cannot use the vacant height and must use the different in
the height and sufficient height as the basis for the number of nodes
needed.
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
---
include/linux/maple_tree.h | 4 +++-
lib/maple_tree.c | 17 +++++++++++++++--
tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++
3 files changed, 46 insertions(+), 3 deletions(-)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index 7d777aa2d9ed..37dc9525dff6 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -464,6 +464,7 @@ struct ma_wr_state {
void *entry; /* The entry to write */
void *content; /* The existing entry that is being overwritten */
unsigned char vacant_height; /* Depth of lowest node with free space */
+ unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */
};
#define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock))
@@ -499,7 +500,8 @@ struct ma_wr_state {
.mas = ma_state, \
.content = NULL, \
.entry = wr_entry, \
- .vacant_height = 0 \
+ .vacant_height = 0, \
+ .sufficient_height = 0 \
}
#define MA_TOPIARY(name, tree) \
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4de257003251..8fdd3f477198 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3558,6 +3558,13 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
if (mas->end < mt_slots[wr_mas->type] - 1)
wr_mas->vacant_height = mas->depth + 1;
+ if (ma_is_root(mas_mn(mas))) {
+ /* root needs more than 2 entries to be sufficient + 1 */
+ if (mas->end > 2)
+ wr_mas->sufficient_height = 1;
+ } else if (mas->end > mt_min_slots[wr_mas->type] + 1)
+ wr_mas->sufficient_height = mas->depth + 1;
+
mas_wr_walk_traverse(wr_mas);
}
@@ -4193,13 +4200,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
ret = 0;
break;
case wr_spanning_store:
- ret = height * 3 + 1;
+ if (wr_mas->sufficient_height < wr_mas->vacant_height)
+ ret = (height - wr_mas->sufficient_height) * 3 + 1;
+ else
+ ret = delta * 3 + 1;
Ah, ret was short lived. Okay.
I still think this stuff needs some more context in the commit message.
break;
case wr_split_store:
ret = delta * 2 + 1;
break;
case wr_rebalance:
- ret = height * 2 + 1;
+ if (wr_mas->sufficient_height < wr_mas->vacant_height)
+ ret = (height - wr_mas->sufficient_height) * 2 + 1;
+ else
+ ret = delta * 2 + 1;
break;
case wr_node_store:
ret = mt_in_rcu(mas->tree) ? 1 : 0;
diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
index d22c1008dffe..d40f70671cb8 100644
--- a/tools/testing/radix-tree/maple.c
+++ b/tools/testing/radix-tree/maple.c
@@ -36334,6 +36334,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt)
extern void test_kmem_cache_bulk(void);
+/*
+ * Test to check the path of a spanning rebalance which results in
+ * a collapse where the rebalancing of the child node leads to
+ * insufficieny in the parent node.
+ */
+static void check_collapsing_rebalance(struct maple_tree *mt)
+{
+ int i = 0;
+ MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX);
+
+ /* create a height 4 tree */
+ while (mt_height(mt) < 4) {
+ mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL);
+ i += 9;
+ }
+
+ /* delete all entries one at a time, starting from the right */
+ do {
+ mas_erase(&mas);
+ } while (mas_prev(&mas, 0) != NULL);
+
+ mtree_unlock(mt);
+}
+
/* callback function used for check_nomem_writer_race() */
static void writer2(void *maple_tree)
{
@@ -36500,6 +36524,10 @@ void farmer_tests(void)
check_spanning_write(&tree);
mtree_destroy(&tree);
+ mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+ check_collapsing_rebalance(&tree);
+ mtree_destroy(&tree);
+
mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
check_null_expand(&tree);
mtree_destroy(&tree);
--
2.43.0