On Thu, Apr 04, 2019 at 05:43:20PM +0200, Uladzislau Rezki wrote: > Hello, Roman! > > > > > The patch looks really good to me! I've tried hard, but didn't find > > any serious issues/bugs. Some small nits below. > > > > Thank you for working on it! > > > I try my best, thank you for your review! > > > BTW, when sending a new iteration, please use "[PATCH vX]" subject prefix, > > e.g. [PATCH v3 1/3] mm/vmap: keep track of free blocks for vmap allocation". > > RESEND usually means that you're sending the same version, e.g. when > > you need cc more people. > > > Thank you for the clarification. I will fix that next time. > > > > > On Tue, Apr 02, 2019 at 06:25:29PM +0200, Uladzislau Rezki (Sony) wrote: > > > Currently an allocation of the new vmap area is done over busy > > > list iteration(complexity O(n)) until a suitable hole is found > > > between two busy areas. Therefore each new allocation causes > > > the list being grown. Due to over fragmented list and different > > > permissive parameters an allocation can take a long time. For > > > example on embedded devices it is milliseconds. > > > > > > This patch organizes the KVA memory layout into free areas of the > > > 1-ULONG_MAX range. It uses an augment red-black tree that keeps > > > blocks sorted by their offsets in pair with linked list keeping > > > the free space in order of increasing addresses. > > > > > > Each vmap_area object contains the "subtree_max_size" that reflects > > > a maximum available free block in its left or right sub-tree. Thus, > > > that allows to take a decision and traversal toward the block that > > > will fit and will have the lowest start address, i.e. sequential > > > allocation. > > > > I'd add here that an augmented red-black tree is used, and nodes > > are augmented with the size of the maximum available free block. > > > Will add. > > > > > > > Allocation: to allocate a new block a search is done over the > > > tree until a suitable lowest(left most) block is large enough > > > to encompass: the requested size, alignment and vstart point. > > > If the block is bigger than requested size - it is split. > > > > > > De-allocation: when a busy vmap area is freed it can either be > > > merged or inserted to the tree. Red-black tree allows efficiently > > > find a spot whereas a linked list provides a constant-time access > > > to previous and next blocks to check if merging can be done. In case > > > of merging of de-allocated memory chunk a large coalesced area is > > > created. > > > > > > Complexity: ~O(log(N)) > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > > > --- > > > include/linux/vmalloc.h | 6 +- > > > mm/vmalloc.c | 1004 +++++++++++++++++++++++++++++++++++------------ > > > 2 files changed, 762 insertions(+), 248 deletions(-) > > > > > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > > > index 398e9c95cd61..ad483378fdd1 100644 > > > --- a/include/linux/vmalloc.h > > > +++ b/include/linux/vmalloc.h > > > @@ -45,12 +45,16 @@ struct vm_struct { > > > struct vmap_area { > > > unsigned long va_start; > > > unsigned long va_end; > > > + > > > + /* > > > + * Largest available free size in subtree. > > > + */ > > > + unsigned long subtree_max_size; > > > unsigned long flags; > > > struct rb_node rb_node; /* address sorted rbtree */ > > > struct list_head list; /* address sorted list */ > > > struct llist_node purge_list; /* "lazy purge" list */ > > > struct vm_struct *vm; > > > - struct rcu_head rcu_head; > > > }; > > > > > > /* > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index 755b02983d8d..3adbad3fb6c1 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -31,6 +31,7 @@ > > > #include <linux/compiler.h> > > > #include <linux/llist.h> > > > #include <linux/bitops.h> > > > +#include <linux/rbtree_augmented.h> > > > > > > #include <linux/uaccess.h> > > > #include <asm/tlbflush.h> > > > @@ -320,9 +321,7 @@ unsigned long vmalloc_to_pfn(const void *vmalloc_addr) > > > } > > > EXPORT_SYMBOL(vmalloc_to_pfn); > > > > > > - > > > /*** Global kva allocator ***/ > > > - > > > > Do we need this change? > > > This patch does not tend to refactor the code. I have removed extra empty > lines because i touched the code around. I can either keep that change or > remove it. What is your opinion? Usually it's better to separate cosmetic changes from functional, if you're not touching directly these lines. Not a big deal, of course. > > > > #define VM_LAZY_FREE 0x02 > > > #define VM_VM_AREA 0x04 > > > > > > @@ -331,14 +330,76 @@ static DEFINE_SPINLOCK(vmap_area_lock); > > > LIST_HEAD(vmap_area_list); > > > static LLIST_HEAD(vmap_purge_list); > > > static struct rb_root vmap_area_root = RB_ROOT; > > > +static bool vmap_initialized __read_mostly; > > > + > > > +/* > > > + * This kmem_cache is used for vmap_area objects. Instead of > > > + * allocating from slab we reuse an object from this cache to > > > + * make things faster. Especially in "no edge" splitting of > > > + * free block. > > > + */ > > > +static struct kmem_cache *vmap_area_cachep; > > > + > > > +/* > > > + * This linked list is used in pair with free_vmap_area_root. > > > + * It gives O(1) access to prev/next to perform fast coalescing. > > > + */ > > > +static LIST_HEAD(free_vmap_area_list); > > > + > > > +/* > > > + * This augment red-black tree represents the free vmap space. > > > + * All vmap_area objects in this tree are sorted by va->va_start > > > + * address. It is used for allocation and merging when a vmap > > > + * object is released. > > > + * > > > + * Each vmap_area node contains a maximum available free block > > > + * of its sub-tree, right or left. Therefore it is possible to > > > + * find a lowest match of free area. > > > + */ > > > +static struct rb_root free_vmap_area_root = RB_ROOT; > > > > > > -/* The vmap cache globals are protected by vmap_area_lock */ > > > -static struct rb_node *free_vmap_cache; > > > -static unsigned long cached_hole_size; > > > -static unsigned long cached_vstart; > > > -static unsigned long cached_align; > > > +static __always_inline unsigned long > > > +__va_size(struct vmap_area *va) > > > +{ > > > + return (va->va_end - va->va_start); > > > +} > > > + > > > +static __always_inline unsigned long > > > +get_subtree_max_size(struct rb_node *node) > > > +{ > > > + struct vmap_area *va; > > > > > > -static unsigned long vmap_area_pcpu_hole; > > > + va = rb_entry_safe(node, struct vmap_area, rb_node); > > > + return va ? va->subtree_max_size : 0; > > > +} > > > + > > > +/* > > > + * Gets called when remove the node and rotate. > > > + */ > > > +static __always_inline unsigned long > > > +compute_subtree_max_size(struct vmap_area *va) > > > +{ > > > + unsigned long max_size = __va_size(va); > > > + unsigned long child_max_size; > > > + > > > + child_max_size = get_subtree_max_size(va->rb_node.rb_right); > > > + if (child_max_size > max_size) > > > + max_size = child_max_size; > > > + > > > + child_max_size = get_subtree_max_size(va->rb_node.rb_left); > > > + if (child_max_size > max_size) > > > + max_size = child_max_size; > > > + > > > + return max_size; > > > > Nit: you can use max3 instead, e.g. : > > > > return max3(__va_size(va), > > get_subtree_max_size(va->rb_node.rb_left), > > get_subtree_max_size(va->rb_node.rb_right)); > > > Good point. Will replace it! > > > > +} > > > + > > > +RB_DECLARE_CALLBACKS(static, free_vmap_area_rb_augment_cb, > > > + struct vmap_area, rb_node, unsigned long, subtree_max_size, > > > + compute_subtree_max_size) > > > + > > > +static void purge_vmap_area_lazy(void); > > > +static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); > > > +static unsigned long lazy_max_pages(void); > > > > > > static struct vmap_area *__find_vmap_area(unsigned long addr) > > > { > > > @@ -359,41 +420,520 @@ static struct vmap_area *__find_vmap_area(unsigned long addr) > > > return NULL; > > > } > > > > > > -static void __insert_vmap_area(struct vmap_area *va) > > > -{ > > > - struct rb_node **p = &vmap_area_root.rb_node; > > > - struct rb_node *parent = NULL; > > > - struct rb_node *tmp; > > > +/* > > > + * This function returns back addresses of parent node > > > + * and its left or right link for further processing. > > > + */ > > > +static __always_inline struct rb_node ** > > > +__find_va_links(struct vmap_area *va, > > > + struct rb_root *root, struct rb_node *from, > > > + struct rb_node **parent) > > > > The function looks much cleaner now, thank you! > > > > But if I understand it correctly, it returns a node (via parent) > > and a pointer to one of two links, so that the returned value > > is always == parent + some constant offset. > > If so, I wonder if it's cleaner to return a parent node > > (as rb_node*) and a bool value which will indicate if the left > > or the right link should be used. > > > > Not a strong opinion, just an idea. > > > I see your point. Yes, that is possible to return "bool" value that > indicates left or right path. After that we can detect the direction. > > From the other hand, we end up and access the correct link anyway during > the traversal the tree. In case of "bool" way, we will need to add on top > some extra logic that checks where to attach to. Sure, makes sense. I'd add some comments here then. Thanks!