Hello, This is the v13 + fixes and still based against mm-unstable v6.0-rc1-140-geb22a5b1b495 git: https://github.com/oracle/linux-uek/tree/howlett/maple/20220906 Patch series "Introducing the Maple Tree". The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. Davidlohr said : Yes I like the maple tree, and at this stage I don't think we can ask for : more from this series wrt the MM - albeit there seems to still be some : folks reporting breakage. Fundamentally I see Liam's work to (re)move : complexity out of the MM (not to say that the actual maple tree is not : complex) by consolidating the three complimentary data structures very : much worth it considering performance does not take a hit. This was very : much a turn off with the range locking approach, which worst case scenario : incurred in prohibitive overhead. Also as Liam and Matthew have : mentioned, RCU opens up a lot of nice performance opportunities, and in : addition academia[1] has shown outstanding scalability of address spaces : with the foundation of replacing the locked rbtree with RCU aware trees. A similar work has been discovered in the academic press https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf Sheer coincidence. We designed our tree with the intention of solving the hardest problem first. Upon settling on a b-tree variant and a rough outline, we researched ranged based b-trees and RCU b-trees and did find that article. So it was nice to find reassurances that we were on the right path, but our design choice of using ranges made that paper unusable for us. Changes: - Documentation warning fix - Thanks Stephen Rothwell - Fixed mlock start address when memory tags are used - Mark Brown & Catalin Marinas - Added fix to nommu from Yang Yingliang - Added git commit messages for mprotect, mremap, oom_kill, and mremap iterating - Thanks Davidlohr Bueso - Added more information to the test_maple_tree.c git change log - Thanks Davidlohr Bueso - Change fs/proc/base to vmi iterator - Thanks Davidlohr Bueso - Minor cleanup to mm/khugepaged patch - Thanks Davidlohr Bueso - Added more Reviewed-by - Thanks Davidlohr Bueso - Added note about BUG_ON()s to git change log - Thanks Andrew Morton - Fixed accounting error internal to the patch set to mm->map_count in do_brk_flags() v13: https://lore.kernel.org/linux-mm/20220822150128.1562046-1-Liam.Howlett@xxxxxxxxxx/ v12: https://lore.kernel.org/linux-mm/20220720021727.17018-1-Liam.Howlett@xxxxxxxxxx/ v11: https://lore.kernel.org/linux-mm/20220717024615.2106835-1-Liam.Howlett@xxxxxxxxxx/ v10: https://lore.kernel.org/linux-mm/20220621204632.3370049-1-Liam.Howlett@xxxxxxxxxx/ v9: https://lore.kernel.org/lkml/20220504010716.661115-1-Liam.Howlett@xxxxxxxxxx/ ...and https://lore.kernel.org/lkml/20220504011215.661968-1-Liam.Howlett@xxxxxxxxxx/ v8: https://lore.kernel.org/lkml/20220426150616.3937571-1-Liam.Howlett@xxxxxxxxxx/ v7: https://lore.kernel.org/linux-mm/20220404143501.2016403-8-Liam.Howlett@xxxxxxxxxx/ v6: https://lore.kernel.org/linux-mm/20220215143728.3810954-1-Liam.Howlett@xxxxxxxxxx/ v5: https://lore.kernel.org/linux-mm/20220202024137.2516438-1-Liam.Howlett@xxxxxxxxxx/ v4: https://lore.kernel.org/linux-mm/20211201142918.921493-1-Liam.Howlett@xxxxxxxxxx/ v3: https://lore.kernel.org/linux-mm/20211005012959.1110504-1-Liam.Howlett@xxxxxxxxxx/ v2: https://lore.kernel.org/linux-mm/20210817154651.1570984-1-Liam.Howlett@xxxxxxxxxx/ v1: https://lore.kernel.org/linux-mm/20210428153542.2814175-1-Liam.Howlett@xxxxxxxxxx/ Liam R. Howlett (45): Maple Tree: add new data structure radix tree test suite: add pr_err define radix tree test suite: add kmem_cache_set_non_kernel() radix tree test suite: add allocation counts and size to kmem_cache radix tree test suite: add support for slab bulk APIs radix tree test suite: add lockdep_is_held to header lib/test_maple_tree: add testing for maple tree mm: start tracking VMAs with maple tree mm/mmap: use the maple tree in find_vma() instead of the rbtree. mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree mm/mmap: use maple tree for unmapped_area{_topdown} kernel/fork: use maple tree for dup_mmap() during forking damon: convert __damon_va_three_regions to use the VMA iterator mm: remove rb tree. mmap: change zeroing of maple tree in __vma_adjust() xen: use vma_lookup() in privcmd_ioctl_mmap() mm: optimize find_exact_vma() to use vma_lookup() mm/khugepaged: optimize collapse_pte_mapped_thp() by using vma_lookup() mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap() mm: use maple tree operations for find_vma_intersection() mm/mmap: use advanced maple tree API for mmap_region() mm: remove vmacache mm: convert vma_lookup() to use mtree_load() mm/mmap: move mmap_region() below do_munmap() mm/mmap: reorganize munmap to use maple states mm/mmap: change do_brk_munmap() to use do_mas_align_munmap() arm64: Change elfcore for_each_mte_vma() to use VMA iterator fs/proc/base: use the vma iterators in place of linked list userfaultfd: use maple tree iterator to iterate VMAs ipc/shm: use VMA iterator instead of linked list bpf: remove VMA linked list mm/gup: use maple tree navigation instead of linked list mm/madvise: use vma_find() instead of vma linked list mm/memcontrol: stop using mm->highest_vm_end mm/mempolicy: use vma iterator & maple state instead of vma linked list mm/mprotect: use maple tree navigation instead of VMA linked list mm/mremap: use vma_find_intersection() instead of vma linked list mm/msync: use vma_find() instead of vma linked list mm/oom_kill: use vma iterators instead of vma linked list mm/swapfile: use vma iterator instead of vma linked list riscv: use vma iterator for vdso mm/vmscan: Use vma iterator instead of vm_next mm: remove the vma linked list mm/mmap: drop range_has_overlap() function mm/mmap.c: pass in mapping to __vma_link_file() Matthew Wilcox (Oracle) (25): mm: add VMA iterator mmap: use the VMA iterator in count_vma_pages_range() proc: remove VMA rbtree use from nommu arm64: remove mmap linked list from vdso parisc: remove mmap linked list from cache handling powerpc: remove mmap linked list walks s390: remove vma linked list walks x86: remove vma linked list walks xtensa: remove vma linked list walks cxl: remove vma linked list walk optee: remove vma linked list walk um: remove vma linked list walk coredump: remove vma linked list walk exec: use VMA iterator instead of linked list fs/proc/task_mmu: stop using linked list and highest_vm_end acct: use VMA iterator instead of linked list perf: use VMA iterator sched: use maple tree iterator to walk VMAs fork: use VMA iterator mm/khugepaged: stop using vma linked list mm/ksm: use vma iterators instead of vma linked list mm/mlock: use vma iterator and maple state instead of vma linked list mm/pagewalk: use vma_find() instead of vma linked list i915: use the VMA iterator nommu: remove uses of VMA linked list Documentation/core-api/index.rst | 1 + Documentation/core-api/maple_tree.rst | 217 + MAINTAINERS | 12 + arch/arm64/kernel/elfcore.c | 16 +- arch/arm64/kernel/vdso.c | 3 +- arch/parisc/kernel/cache.c | 9 +- arch/powerpc/kernel/vdso.c | 6 +- arch/powerpc/mm/book3s32/tlb.c | 11 +- arch/powerpc/mm/book3s64/subpage_prot.c | 13 +- arch/riscv/kernel/vdso.c | 3 +- arch/s390/kernel/vdso.c | 3 +- arch/s390/mm/gmap.c | 6 +- arch/um/kernel/tlb.c | 14 +- arch/x86/entry/vdso/vma.c | 9 +- arch/x86/kernel/tboot.c | 2 +- arch/xtensa/kernel/syscall.c | 18 +- drivers/firmware/efi/efi.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 14 +- drivers/misc/cxl/fault.c | 45 +- drivers/tee/optee/call.c | 18 +- drivers/xen/privcmd.c | 2 +- fs/coredump.c | 34 +- fs/exec.c | 12 +- fs/proc/base.c | 5 +- fs/proc/internal.h | 2 +- fs/proc/task_mmu.c | 74 +- fs/proc/task_nommu.c | 45 +- fs/userfaultfd.c | 62 +- include/linux/maple_tree.h | 685 + include/linux/mm.h | 78 +- include/linux/mm_types.h | 43 +- include/linux/mm_types_task.h | 12 - include/linux/sched.h | 1 - include/linux/userfaultfd_k.h | 7 +- include/linux/vm_event_item.h | 4 - include/linux/vmacache.h | 28 - include/linux/vmstat.h | 6 - include/trace/events/maple_tree.h | 123 + include/trace/events/mmap.h | 73 + init/main.c | 2 + ipc/shm.c | 21 +- kernel/acct.c | 11 +- kernel/bpf/task_iter.c | 10 +- kernel/debug/debug_core.c | 12 - kernel/events/core.c | 3 +- kernel/events/uprobes.c | 9 +- kernel/fork.c | 62 +- kernel/sched/fair.c | 10 +- lib/Kconfig.debug | 17 +- lib/Makefile | 2 +- lib/maple_tree.c | 7130 +++ lib/test_maple_tree.c | 38307 ++++++++++++++++ mm/Makefile | 2 +- mm/damon/vaddr-test.h | 36 +- mm/damon/vaddr.c | 53 +- mm/debug.c | 14 +- mm/gup.c | 7 +- mm/huge_memory.c | 4 +- mm/init-mm.c | 4 +- mm/internal.h | 8 +- mm/khugepaged.c | 11 +- mm/ksm.c | 18 +- mm/madvise.c | 2 +- mm/memcontrol.c | 6 +- mm/memory.c | 33 +- mm/mempolicy.c | 56 +- mm/mlock.c | 35 +- mm/mmap.c | 2154 +- mm/mprotect.c | 8 +- mm/mremap.c | 22 +- mm/msync.c | 2 +- mm/nommu.c | 260 +- mm/oom_kill.c | 3 +- mm/pagewalk.c | 2 +- mm/swapfile.c | 4 +- mm/util.c | 32 - mm/vmacache.c | 117 - mm/vmscan.c | 15 +- mm/vmstat.c | 4 - tools/include/linux/slab.h | 4 + tools/testing/radix-tree/.gitignore | 2 + tools/testing/radix-tree/Makefile | 9 +- tools/testing/radix-tree/generated/autoconf.h | 1 + tools/testing/radix-tree/linux.c | 160 +- tools/testing/radix-tree/linux/kernel.h | 1 + tools/testing/radix-tree/linux/lockdep.h | 2 + tools/testing/radix-tree/linux/maple_tree.h | 7 + tools/testing/radix-tree/maple.c | 59 + .../radix-tree/trace/events/maple_tree.h | 5 + 89 files changed, 48581 insertions(+), 1895 deletions(-) create mode 100644 Documentation/core-api/maple_tree.rst create mode 100644 include/linux/maple_tree.h delete mode 100644 include/linux/vmacache.h create mode 100644 include/trace/events/maple_tree.h create mode 100644 lib/maple_tree.c create mode 100644 lib/test_maple_tree.c delete mode 100644 mm/vmacache.c create mode 100644 tools/testing/radix-tree/linux/maple_tree.h create mode 100644 tools/testing/radix-tree/maple.c create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h -- 2.35.1