* Mark Hemment <markhemm@xxxxxxxxxxxxxx> [220216 05:12]: > Only a few v minor comments. > > On Tue, 15 Feb 2022 at 14:43, Liam Howlett <liam.howlett@xxxxxxxxxx> wrote: > > > > From: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> > > > > The maple tree is an RCU-safe range based B-tree designed to use modern > > processor cache efficiently. There are a number of places in the kernel > > that a non-overlapping range-based tree would be beneficial, especially > > one with a simple interface. The first user that is covered in this > > patch set is the vm_area_struct, where three data structures are > > replaced by the maple tree: the augmented rbtree, the vma cache, and the > > linked list of VMAs in the mm_struct. The long term goal is to reduce > > or remove the mmap_sem contention. > > > > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf > > nodes. With the increased branching factor, it is significantly shorter than > > the rbtree so it has fewer cache misses. The removal of the linked list > > between subsequent entries also reduces the cache misses and the need to pull > > in the previous and next VMA during many tree alterations. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > > Tested-by: David Howells <dhowells@xxxxxxxxxx> > > --- > > Documentation/core-api/index.rst | 1 + > > Documentation/core-api/maple_tree.rst | 218 + > > MAINTAINERS | 12 + > > include/linux/maple_tree.h | 683 ++ > > include/trace/events/maple_tree.h | 123 + > > init/main.c | 2 + > > lib/Kconfig.debug | 16 + > > lib/Makefile | 3 +- > > lib/maple_tree.c | 6967 +++++++++++++++++ > > tools/testing/radix-tree/.gitignore | 2 + > > tools/testing/radix-tree/Makefile | 13 +- > > tools/testing/radix-tree/generated/autoconf.h | 1 + > > tools/testing/radix-tree/linux/maple_tree.h | 7 + > > tools/testing/radix-tree/maple.c | 59 + > > .../radix-tree/trace/events/maple_tree.h | 3 + > > 15 files changed, 8107 insertions(+), 3 deletions(-) > > create mode 100644 Documentation/core-api/maple_tree.rst > > create mode 100644 include/linux/maple_tree.h > > create mode 100644 include/trace/events/maple_tree.h > > create mode 100644 lib/maple_tree.c > > create mode 100644 tools/testing/radix-tree/linux/maple_tree.h > > create mode 100644 tools/testing/radix-tree/maple.c > > create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h > > ... > > +++ b/include/linux/maple_tree.h > ... > > +/* > > + * The Maple Tree squeezes various bits in at various points which aren't > > + * necessarily obvious. Usually, this is done by observing that pointers are > > + * N-byte aligned and thus the bottom log_2(N) bits are available for use. We > > + * don't use the high bits of pointers to store additional information because > > + * we don't know what bits are unused on any given architecture. > > + * > > + * Nodes are 256 bytes in size and are also aligned to 256 bytes, giving us 8 > > + * low bits for our own purposes. Nodes are currently of 4 types: > > Is it worth adding the compile-time check; > static_assert(sizeof(struct maple_tree) == 256); > ? sizeof(struct maple_node), but yes. That sounds reasonable. > ... > > > +struct ma_wr_state { > > + struct ma_state *mas; > > + struct maple_node *node; > > + enum maple_type type; > > + unsigned long r_min; > > + unsigned long r_max; > > + unsigned char offset_end; > > + unsigned char node_end; > > + unsigned long *pivots; > > + unsigned long end_piv; > > + void __rcu **slots; > > + void *entry; > > + void *content; > > +}; > > Minor: Moving the member 'type' to below 'r_max', means it will remove > the need for padding (reducing size from 88 to 80 byes). Does this > matter? I doubt it. Better/worse for h/w cache? Suspect no > difference. It wouldn't hurt. I'll make this change. > ... > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > @@ -827,6 +834,7 @@ config DEBUG_VM_RB > > > > If unsure, say N. > > > > + > > config DEBUG_VM_PGFLAGS > > bool "Debug page-flags operations" > > depends on DEBUG_VM > > Stray blank line. > ... Thanks > > > +++ b/lib/maple_tree.c > > > + * Note types: > > + * 0x??1 = Root > > + * 0x?00 = 16 bit nodes > > + * 0x010 = 32 bit nodes > > + * 0x110 = 64 bit nodes > > + * > > + * Slot size and alignment > > + * 0x??1 : Root > > + * 0x?00 : 16 bit values, type in 0-1, slot in 2-6 > > + * 0x010 : 32 bit values, type in 0-2, slot in 3-6 > > + * 0x110 : 64 bit values, type in 0-2, slit in 3-6 > > + */ > > s/slit/slot/ Thanks > > > + > > +#define MAPLE_PARENT_ROOT 0x01 > ... > > > +/* > > + * ma_meta_gap() - Get the largest gap location of a node from the metadat > > s/metadat/metadata/ Thanks > ... > > > +/* > > + * mast_rebalace_prev() - Rebalance against the previous node > > s/mast_rebalace_prev/mast_rebalance_prev/ Thanks. > > > > + * @mast: The maple subtree state > > + * @old_l: The encoded maple node to the left (previous node) > > + */ > > +static inline void mast_rebalance_prev(struct maple_subtree_state *mast, > > + struct maple_enode *old_l) > ... > > > +static inline void *mas_state_walk(struct ma_state *mas) > > +{ > > + void *entry; > > + > > + entry = mas_start(mas); > > + if (mas_is_none(mas)) > > + return NULL; > > + > > + if (mas_is_ptr(mas)) > > + return entry; > > + > > + return mtree_range_walk(mas); > > +} > > A blank line after this mas_state_walk() function would help the formatting. Thanks > > > +/* > > + * mtree_lookup_walk() - Internal quick lookup that does not keep maple state up > > + * to date. > > + * > > + * @mas: The maple state. > > + * > > + * Note: Leaves mas in undesirable state. > > + * Return: The entry for @mas->index or %NULL on dead node. > > + */ > > +static inline void *mtree_lookup_walk(struct ma_state *mas) > ... > > > +/** > > + * mt_prev() - get the previous value in the maple tree > > + * @mt: The maple tree > > + * @index: The start index > > + * @min: The minumum index to check > > s/minumum/minimum/ Thanks > ... > > > +/** > > + * mas_nomem() - Check if there was an error allocating and do the allocation > > + * if necessary If there are allocations, then free them. > > + * @mas: The maple state > > + * @gfp: The GFP_FALGS to use for allocations > > s/GFP_FALGS/GFP_FLAGS/ Thanks > ... > > > +/** > > + * mtree_insert_range() - Insert an entry at a give range if there is no value. > > + * @mt: The maple tree > > + * @first: The start of the range > > + * @last: The end of the range > > + * @entry: The entry to store > > + * @gfp: The FGP_FLAGS to use for allocations. > > s/FGP_FLAGS/GFP_FLAGS/ One day I will type this correctly the first time. Not today it seems. > ...