Hello, This months update of the maple tree development and work items follows. I send out an updated list monthly with peoples names against work items to avoid work duplication, so please don't work on something with a name next to the work item. If you want to work on something then please respond to this email so everyone knows, and I can add your name to the item. Feel free to ask questions to clarify the feature or discuss directions. Likewise, if you cannot work on anything anymore then let me know so others can. If you have any ideas, then please email the list. We can discuss it and I can add it to the list. Please sign up to the maple tree mailing list [1] to get all updates. Features: - Better preallocation calculations - Liam R. Howlett 2023/07/07 Without tracking the tree status on the walk down, we can partially walk down to figure out the type of write and which 'worst case' a write will cause. Using this "write type" information, the preallocations can be a better estimate. v3 was sent to the mailing list [2] and is in v6.6-rc1. - mas_store_gfp() align with mas_prealloc() Right now mas_store_gfp() is going to allocate once it figures out what to do and it does the worst case allocations. This is inefficient and can easily be done better by doing more of what mas_prealloc()/mas_store_prealloc() does - or at least will be doing once the 'Better preallocation calculations' lands. - Tracking tree status on walks down Track in the maple state when the last minimum nodes, space for 2, or space for 3 slots occurred (height in the tree). This will allow for a better estimate on how many nodes are needed for a write. This can be complicated on next/prev walking, etc. It has to be done in read mode since some walk may have been done before deciding to write - see mmap_region(), for example. - Store type (enum for store type?) Extending the "Tracking tree status on walks down", the information can then be used to determine what operation is needed during a mas_prealloct(), etc. The operation can be stored in the maple state to continue on a mas_store_prealloc(). Obviously, this would work out well if we have mas_store_gfp() using the same functionality as mentioned above. - Full count/flag & Dense nodes There is a bit that exists in the pointer reserved to indicate there is no NULLs in the leaves below. This feature is mainly for trees that are non-alloc trees. We can then know there is at least one singleton that can be stored below this entry. This is coupled with restoring Dense nodes as a potential node type. The tricky part is deciding on when to switch to dense nodes/back from dense nodes (all entries to dense nodes must be singletons!). See mte_set_full(), mte_clear_full(), mte_has_null() which use MAPLE_ENODE_NULL. - Fork & Dup tree + Delete DONT_COPY - Peng Zhang 2023/07/12 This is to optimize dup_mmap() in kernel/fork.c, but other users may want faster duplications of a tree. This should be faster than building a tree in bulk mode. The idea is to just copy each node and replace the pointers in, probably, a BFS order. Once the leaves are reached, the VMAs will be replaced by the copies made in fork, unless DONT_COPY is set, in which case the VMA will be deleted from the copied tree. DONT_COPY is not common and since the tree isn't visible, all nodes should be available for reuse (no RCU worries). v2 was sent to the mailing list [3]. - Push reuse parent During an insert, new nodes can be "pushed" left or right - see mas_push_data(). On a left push, it may be possible to reuse the node by making the write like an 'append'. This may also be possible to be done during a split operation when the write extends the last slot. This needs examining to ensure RCU safety. - Contiguous iterator Some users want to iterate across a range of items, but only contiguous items. Sometimes a gap within the range is considered an error while other times they are not. Good targets for this feature are mlock() and mprotect() system calls. The goal is to provide an easy to use interface that reduces the complexity of the external code that keeps track of the contiguous nature of the iteration. - Remove BIG_NODE maple tree nodes are rebalanced, split, and updated using a maple_big_node struct. This structure is over twice the size of regular nodes to ensure there is enough space to contain the data. Ideally the same feats can be accomplished by using regular sized nodes. - Test & Fix destroy_rebalance Destroy_rebalance needs better testing when it occurs in rcu mode. Currently it is not used in this mode, but it should be supported. This part of the code needs better test coverage and fix any necessary issues that arise with rcu readers during updates. - More userspace testing - Liam R. Howlett 2023/09/12 Add userspace fuzzer testing to the test suite. There was a patch sent out to create a basic clang fuzzer. Using something along those lines, write a fuzzer to test the maple tree. - Live locking with delays for testing Add some sort of functional delay operation to work with testing to reduce potential locking issues, especially with CPU contention. - mas_for_each caching of end of node Adding metadata for the end of node was a big performance win. Extend this by caching the end of the node while iterating (to start). Later this could be expanded to other iterators. - Add min split span to parents When nodes are split, the parent node splits by number of entries. It may be better to split by range as it would help avoid splitting more frequently on inserts. - Search Marks To complete the features that xarray support, the maple tree needs to support search marks. Marks will replace the gap tracking in a new node type complete with searching for a specific tag from the top-level to the leaves; just like the gaps that exist in the allocation nodes. It will probably need two new node types (range and dense). Dense with 29 pointers, range64 with 15 pointers. Larger Features: - 64 bit stores on 32 bit arch A number of users want to use the maple tree, but cannot because they need a storage medium that supports 64 bit stores on 32 bit arch. - wr_mas with alloc instead of mas Internally, we have a write maple state, but the maple state currently holds the allocations. I'm not sure if this is worth it, and it will probably need "Tracking tree status on walks down" so that preallocations/allocations can be accurate. Other considerations are how mas_prealloc() would work with this change. - Big Dense Node Type There are various node types that could be added to the maple tree. One of the most useful is probably a 'big dense' with the node being a 4k block of singletons. This would have to come after the dense node type as dense enables more users and helps scope this work. - Judy Array The Judy Array is another fancy data structure. Mathieu Desnoyers has enquired and spoken to us about incorporating judy arrays as a node type or other features he has in the judy array. This is in the early thought stages. Please note that any new features will need test cases added to ensure they function correctly. [1] https://lists.infradead.org/mailman/listinfo/maple-tree [2] http://lists.infradead.org/pipermail/maple-tree/2023-July/002736.html [3] https://lore.kernel.org/linux-mm/20230830125654.21257-1-zhangpeng.00@xxxxxxxxxxxxx/