* Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> [231005 11:55]: > > > 在 2023/10/4 22:25, Liam R. Howlett 写道: > > * Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> [231004 05:09]: > > > > > > > > > 在 2023/10/4 02:45, Liam R. Howlett 写道: > > > > * Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> [230924 23:58]: > > > > > Introduce interfaces __mt_dup() and mtree_dup(), which are used to > > > > > duplicate a maple tree. They duplicate a maple tree in Depth-First > > > > > Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the > > > > > source tree and allocate new child nodes in non-leaf nodes. The new node > > > > > is exactly the same as the source node except for all the addresses > > > > > stored in it. It will be faster than traversing all elements in the > > > > > source tree and inserting them one by one into the new tree. The time > > > > > complexity of these two functions is O(n). > > > > > > > > > > The difference between __mt_dup() and mtree_dup() is that mtree_dup() > > > > > handles locks internally. > > > > > > > > > > Signed-off-by: Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> > > > > > --- > > > > > include/linux/maple_tree.h | 3 + > > > > > lib/maple_tree.c | 286 +++++++++++++++++++++++++++++++++++++ > > > > > 2 files changed, 289 insertions(+) > > > > > > > > > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h > > > > > index 666a3764ed89..de5a4056503a 100644 > > > > > --- a/include/linux/maple_tree.h > > > > > +++ b/include/linux/maple_tree.h > > > > > @@ -329,6 +329,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, > > > > > void *entry, gfp_t gfp); > > > > > void *mtree_erase(struct maple_tree *mt, unsigned long index); > > > > > +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > > > > > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > > > > > + > > > > > void mtree_destroy(struct maple_tree *mt); > > > > > void __mt_destroy(struct maple_tree *mt); > > > > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > > > > > index 3fe5652a8c6c..ed8847b4f1ff 100644 > > > > > --- a/lib/maple_tree.c > > > > > +++ b/lib/maple_tree.c > > > > > @@ -6370,6 +6370,292 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) > > > > > } > > > > > EXPORT_SYMBOL(mtree_erase); > > > > > +/* > > > > > + * mas_dup_free() - Free an incomplete duplication of a tree. > > > > > + * @mas: The maple state of a incomplete tree. > > > > > + * > > > > > + * The parameter @mas->node passed in indicates that the allocation failed on > > > > > + * this node. This function frees all nodes starting from @mas->node in the > > > > > + * reverse order of mas_dup_build(). There is no need to hold the source tree > > > > > + * lock at this time. > > > > > + */ > > > > > +static void mas_dup_free(struct ma_state *mas) > > > > > +{ > > > > > + struct maple_node *node; > > > > > + enum maple_type type; > > > > > + void __rcu **slots; > > > > > + unsigned char count, i; > > > > > + > > > > > + /* Maybe the first node allocation failed. */ > > > > > + if (mas_is_none(mas)) > > > > > + return; > > > > > + > > > > > + while (!mte_is_root(mas->node)) { > > > > > + mas_ascend(mas); > > > > > + > > > > > + if (mas->offset) { > > > > > + mas->offset--; > > > > > + do { > > > > > + mas_descend(mas); > > > > > + mas->offset = mas_data_end(mas); > > > > > + } while (!mte_is_leaf(mas->node)); > > > > > + > > > > > + mas_ascend(mas); > > > > > + } > > > > > + > > > > > + node = mte_to_node(mas->node); > > > > > + type = mte_node_type(mas->node); > > > > > + slots = ma_slots(node, type); > > > > > + count = mas_data_end(mas) + 1; > > > > > + for (i = 0; i < count; i++) > > > > > + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; > > > > > + > > > > > + mt_free_bulk(count, slots); > > > > > + } > > > > > + > > > > > + node = mte_to_node(mas->node); > > > > > + mt_free_one(node); > > > > > +} > > > > > + > > > > > +/* > > > > > + * mas_copy_node() - Copy a maple node and replace the parent. > > > > > + * @mas: The maple state of source tree. > > > > > + * @new_mas: The maple state of new tree. > > > > > + * @parent: The parent of the new node. > > > > > + * > > > > > + * Copy @mas->node to @new_mas->node, set @parent to be the parent of > > > > > + * @new_mas->node. If memory allocation fails, @mas is set to -ENOMEM. > > > > > + */ > > > > > +static inline void mas_copy_node(struct ma_state *mas, struct ma_state *new_mas, > > > > > + struct maple_pnode *parent) > > > > > +{ > > > > > + struct maple_node *node = mte_to_node(mas->node); > > > > > + struct maple_node *new_node = mte_to_node(new_mas->node); > > > > > + unsigned long val; > > > > > + > > > > > + /* Copy the node completely. */ > > > > > + memcpy(new_node, node, sizeof(struct maple_node)); > > > > > + > > > > > + /* Update the parent node pointer. */ > > > > > + val = (unsigned long)node->parent & MAPLE_NODE_MASK; > > > > > + new_node->parent = ma_parent_ptr(val | (unsigned long)parent); > > > > > +} > > > > > + > > > > > +/* > > > > > + * mas_dup_alloc() - Allocate child nodes for a maple node. > > > > > + * @mas: The maple state of source tree. > > > > > + * @new_mas: The maple state of new tree. > > > > > + * @gfp: The GFP_FLAGS to use for allocations. > > > > > + * > > > > > + * This function allocates child nodes for @new_mas->node during the duplication > > > > > + * process. If memory allocation fails, @mas is set to -ENOMEM. > > > > > + */ > > > > > +static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas, > > > > > + gfp_t gfp) > > > > > +{ > > > > > + struct maple_node *node = mte_to_node(mas->node); > > > > > + struct maple_node *new_node = mte_to_node(new_mas->node); > > > > > + enum maple_type type; > > > > > + unsigned char request, count, i; > > > > > + void __rcu **slots; > > > > > + void __rcu **new_slots; > > > > > + unsigned long val; > > > > > + > > > > > + /* Allocate memory for child nodes. */ > > > > > + type = mte_node_type(mas->node); > > > > > + new_slots = ma_slots(new_node, type); > > > > > + request = mas_data_end(mas) + 1; > > > > > + count = mt_alloc_bulk(gfp, request, (void **)new_slots); > > > > > + if (unlikely(count < request)) { > > > > > + if (count) { > > > > > + mt_free_bulk(count, new_slots); > > > > > > > > If you look at mm/slab.c: kmem_cache_alloc(), you will see that the > > > > error path already bulk frees for you - but does not zero the array. > > > > This bulk free will lead to double free, but you do need the below > > > > memset(). Also, it will return !count or request. So, I think this code > > > > is never executed as it is written. > > > If kmem_cache_alloc() is called to allocate memory in mt_alloc_bulk(), > > > then this code will not be executed because it only returns 0 or > > > request. However, I am concerned that changes to mt_alloc_bulk() like > > > [1] may be merged, which could potentially lead to memory leaks. To > > > improve robustness, I wrote it this way. > > > > > > How do you think it should be handled? Is it okay to do this like the > > > code below? > > > > > > if (unlikely(count < request)) { > > > memset(new_slots, 0, request * sizeof(unsigned long)); > > > mas_set_err(mas, -ENOMEM); > > > return; > > > } > > > > > > [1] https://lore.kernel.org/lkml/20230810163627.6206-13-vbabka@xxxxxxx/ > > > > Ah, I see. > > > > We should keep the same functionality as before. The code you are > > referencing is an RFC and won't be merged as-is. We should be sure to > > keep an eye on this happening. > > > > I think the code you have there is correct. > > > > > > > > > > I don't think this will show up in your testcases because the test code > > > > doesn't leave dangling pointers and simply returns 0 if there isn't > > > > enough nodes. > > > Yes, no testing here. > > > > Yeah :/ I think we should update the test code at some point to behave > > the same as the real code. Don't worry about it here though. > If we want to test this here, we need to modify the > kmem_cache_alloc_bulk() in the user space to allocate a portion of > memory. This will cause it to behave differently from the corresponding > function in the kernel space. I'm not sure if this modification is > acceptable. Well, no. We can change the test code to do the same as the kernel - if there aren't enough nodes then put pointers to the ones we have in the array but don't actually allocate them (or free them). This way we will catch double-frees or memory leaks. Essentially leaving the partial data in the array. If you want to test it in-kernel, then you could alter the kernel function to check the task name with some global counter to cause it to fail out after some count. That would just be a one-off test though. > > Also, I might need to move the memset() outside of the if > statement (if (unlikely(count < request)){}) to use it for cleaning up > residual pointers. Outside the "if (count)" statement? Yes. I think you have this above in the "code below" section, right? Thanks, Liam