Hello, Matthew! > On Wed, Dec 04, 2024 at 09:51:07AM +0100, Uladzislau Rezki wrote: > > I think, when i have more free cycles, i will check it from performance > > point of view. Because i do not know how much a maple tree is efficient > > when it comes to lookups, insert and removing. > > Maple tree has a fanout of around 8-12 at each level, while an rbtree has > a fanout of two (arguably 3, since we might find the node). Let's say you > have 1000 vmalloc areas. A perfectly balanced rbtree would have 9 levels > (and might well be 11+ levels if imperfectly balanced -- and part of the > advantage of rbtrees over AVL trees is that they can be less balanced > so need fewer rotations). A perfectly balanced maple tree would have > only 3 levels. > Thank you for your explanation and some input on this topic. Density, a high of tree and branching factor should make the work better :) > > Addition/removal is more expensive. We biased the implementation heavily > towards lookup, so we chose to keep it very compact. Most users (and > particularly the VMA tree which was our first client) do more lookups > than modifications; a real application takes many more pagefaults than > it does calls to mmap/munmap/mprotect/etc. > This is what i see. Some use cases are degraded. For example stress-ng forking bench is worse, test_vmalloc.sh also reports a degrade: See below figures: <snip> # Default urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64 + 59.52% 7.15% [kernel] [k] __vmalloc_node_range_noprof + 37.98% 0.22% [test_vmalloc] [k] fix_size_alloc_test + 37.32% 8.56% [kernel] [k] vfree.part.0 + 35.31% 0.00% [kernel] [k] ret_from_fork_asm + 35.31% 0.00% [kernel] [k] ret_from_fork + 35.31% 0.00% [kernel] [k] kthread + 35.05% 0.00% [test_vmalloc] [k] test_func + 34.16% 0.06% [test_vmalloc] [k] long_busy_list_alloc_test + 32.10% 0.12% [kernel] [k] __get_vm_area_node + 31.69% 1.82% [kernel] [k] alloc_vmap_area + 27.24% 5.01% [kernel] [k] _raw_spin_lock + 25.45% 0.15% [test_vmalloc] [k] full_fit_alloc_test + 23.57% 0.03% [kernel] [k] remove_vm_area + 22.23% 22.23% [kernel] [k] native_queued_spin_lock_slowpath + 14.34% 0.94% [kernel] [k] alloc_pages_bulk_noprof + 10.80% 7.51% [kernel] [k] free_vmap_area_noflush + 10.59% 10.59% [kernel] [k] clear_page_rep + 9.52% 8.96% [kernel] [k] insert_vmap_area + 7.39% 2.82% [kernel] [k] find_unlink_vmap_area # Maple-tree time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64 + 74.33% 1.50% [kernel] [k] __vmalloc_node_range_noprof + 55.73% 0.06% [kernel] [k] __get_vm_area_node + 55.53% 1.07% [kernel] [k] alloc_vmap_area + 53.78% 0.09% [test_vmalloc] [k] long_busy_list_alloc_test + 53.75% 1.76% [kernel] [k] _raw_spin_lock + 52.81% 51.80% [kernel] [k] native_queued_spin_lock_slowpath + 28.93% 0.09% [test_vmalloc] [k] full_fit_alloc_test + 23.29% 2.43% [kernel] [k] vfree.part.0 + 20.29% 0.01% [kernel] [k] mt_insert_vmap_area + 20.27% 0.34% [kernel] [k] mtree_insert_range + 15.30% 0.05% [test_vmalloc] [k] fix_size_alloc_test + 14.06% 0.05% [kernel] [k] remove_vm_area + 13.73% 0.00% [kernel] [k] ret_from_fork_asm + 13.73% 0.00% [kernel] [k] ret_from_fork + 13.73% 0.00% [kernel] [k] kthread + 13.51% 0.00% [test_vmalloc] [k] test_func + 13.15% 0.87% [kernel] [k] alloc_pages_bulk_noprof + 9.92% 9.54% [kernel] [k] clear_page_rep + 9.62% 0.07% [kernel] [k] find_unlink_vmap_area + 9.55% 0.04% [kernel] [k] mtree_erase + 5.92% 1.44% [kernel] [k] free_unref_page + 4.92% 0.24% [kernel] [k] mas_insert.isra.0 + 4.69% 0.93% [kernel] [k] mas_erase + 4.47% 0.02% [kernel] [k] rcu_do_batch + 3.35% 2.10% [kernel] [k] __vmap_pages_range_noflush + 3.00% 2.81% [kernel] [k] mas_wr_store_type <snip> i.e. insert/remove are more expansive, at least my test show this. It looks like, mtree_insert() uses a range_variant which implies a tree update after an insert operation is completed. And probably where an overhead comes from. If i use a b+tree(my own implementation), as expected, it is better than rb-tree because of b+tree properties. I have composed some data, you can find more bench data there: wget ftp://vps418301.ovh.net/incoming/Maple_tree_comparison_with_rb_tree_in_vmalloc.pdf >> That's what maple trees do; they store non-overlapping ranges. So you >> can look up any address in a range and it will return you the pointer >> associated with that range. Just like you'd want for a page fault ;-) Thank you. I see. I though that it also can work as a regular b+ or b tress so we do not spend cycles on updates to track ranges. Like below code: int ret = mtree_insert(t, va->va_start, va, GFP_KERNEL); i do not store a range here, i store key -> value pair but maple-tree considers it as range: [va_start:va_start]. Maybe we can improve this case when not a range is passed? This is just my thoughts :) -- Uladzislau Rezki