Only a few v minor comments. On Tue, 15 Feb 2022 at 14:43, Liam Howlett <liam.howlett@xxxxxxxxxx> wrote: > > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> > > The maple tree is an RCU-safe range based B-tree designed to use modern > processor cache efficiently. There are a number of places in the kernel > that a non-overlapping range-based tree would be beneficial, especially > one with a simple interface. The first user that is covered in this > patch set is the vm_area_struct, where three data structures are > replaced by the maple tree: the augmented rbtree, the vma cache, and the > linked list of VMAs in the mm_struct. The long term goal is to reduce > or remove the mmap_sem contention. > > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf > nodes. With the increased branching factor, it is significantly shorter than > the rbtree so it has fewer cache misses. The removal of the linked list > between subsequent entries also reduces the cache misses and the need to pull > in the previous and next VMA during many tree alterations. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > Tested-by: David Howells <dhowells@xxxxxxxxxx> > --- > Documentation/core-api/index.rst | 1 + > Documentation/core-api/maple_tree.rst | 218 + > MAINTAINERS | 12 + > include/linux/maple_tree.h | 683 ++ > include/trace/events/maple_tree.h | 123 + > init/main.c | 2 + > lib/Kconfig.debug | 16 + > lib/Makefile | 3 +- > lib/maple_tree.c | 6967 +++++++++++++++++ > tools/testing/radix-tree/.gitignore | 2 + > tools/testing/radix-tree/Makefile | 13 +- > tools/testing/radix-tree/generated/autoconf.h | 1 + > tools/testing/radix-tree/linux/maple_tree.h | 7 + > tools/testing/radix-tree/maple.c | 59 + > .../radix-tree/trace/events/maple_tree.h | 3 + > 15 files changed, 8107 insertions(+), 3 deletions(-) > create mode 100644 Documentation/core-api/maple_tree.rst > create mode 100644 include/linux/maple_tree.h > create mode 100644 include/trace/events/maple_tree.h > create mode 100644 lib/maple_tree.c > create mode 100644 tools/testing/radix-tree/linux/maple_tree.h > create mode 100644 tools/testing/radix-tree/maple.c > create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h ... > +++ b/include/linux/maple_tree.h ... > +/* > + * The Maple Tree squeezes various bits in at various points which aren't > + * necessarily obvious. Usually, this is done by observing that pointers are > + * N-byte aligned and thus the bottom log_2(N) bits are available for use. We > + * don't use the high bits of pointers to store additional information because > + * we don't know what bits are unused on any given architecture. > + * > + * Nodes are 256 bytes in size and are also aligned to 256 bytes, giving us 8 > + * low bits for our own purposes. Nodes are currently of 4 types: Is it worth adding the compile-time check; static_assert(sizeof(struct maple_tree) == 256); ? ... > +struct ma_wr_state { > + struct ma_state *mas; > + struct maple_node *node; > + enum maple_type type; > + unsigned long r_min; > + unsigned long r_max; > + unsigned char offset_end; > + unsigned char node_end; > + unsigned long *pivots; > + unsigned long end_piv; > + void __rcu **slots; > + void *entry; > + void *content; > +}; Minor: Moving the member 'type' to below 'r_max', means it will remove the need for padding (reducing size from 88 to 80 byes). Does this matter? I doubt it. Better/worse for h/w cache? Suspect no difference. ... > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > @@ -827,6 +834,7 @@ config DEBUG_VM_RB > > If unsure, say N. > > + > config DEBUG_VM_PGFLAGS > bool "Debug page-flags operations" > depends on DEBUG_VM Stray blank line. ... > +++ b/lib/maple_tree.c > + * Note types: > + * 0x??1 = Root > + * 0x?00 = 16 bit nodes > + * 0x010 = 32 bit nodes > + * 0x110 = 64 bit nodes > + * > + * Slot size and alignment > + * 0x??1 : Root > + * 0x?00 : 16 bit values, type in 0-1, slot in 2-6 > + * 0x010 : 32 bit values, type in 0-2, slot in 3-6 > + * 0x110 : 64 bit values, type in 0-2, slit in 3-6 > + */ s/slit/slot/ > + > +#define MAPLE_PARENT_ROOT 0x01 ... > +/* > + * ma_meta_gap() - Get the largest gap location of a node from the metadat s/metadat/metadata/ ... > +/* > + * mast_rebalace_prev() - Rebalance against the previous node s/mast_rebalace_prev/mast_rebalance_prev/ > + * @mast: The maple subtree state > + * @old_l: The encoded maple node to the left (previous node) > + */ > +static inline void mast_rebalance_prev(struct maple_subtree_state *mast, > + struct maple_enode *old_l) ... > +static inline void *mas_state_walk(struct ma_state *mas) > +{ > + void *entry; > + > + entry = mas_start(mas); > + if (mas_is_none(mas)) > + return NULL; > + > + if (mas_is_ptr(mas)) > + return entry; > + > + return mtree_range_walk(mas); > +} A blank line after this mas_state_walk() function would help the formatting. > +/* > + * mtree_lookup_walk() - Internal quick lookup that does not keep maple state up > + * to date. > + * > + * @mas: The maple state. > + * > + * Note: Leaves mas in undesirable state. > + * Return: The entry for @mas->index or %NULL on dead node. > + */ > +static inline void *mtree_lookup_walk(struct ma_state *mas) ... > +/** > + * mt_prev() - get the previous value in the maple tree > + * @mt: The maple tree > + * @index: The start index > + * @min: The minumum index to check s/minumum/minimum/ ... > +/** > + * mas_nomem() - Check if there was an error allocating and do the allocation > + * if necessary If there are allocations, then free them. > + * @mas: The maple state > + * @gfp: The GFP_FALGS to use for allocations s/GFP_FALGS/GFP_FLAGS/ ... > +/** > + * mtree_insert_range() - Insert an entry at a give range if there is no value. > + * @mt: The maple tree > + * @first: The start of the range > + * @last: The end of the range > + * @entry: The entry to store > + * @gfp: The FGP_FLAGS to use for allocations. s/FGP_FLAGS/GFP_FLAGS/ ... Cheers, Mark