On Mon, Oct 22, 2018 at 04:50:06PM +0200, Uladzislau Rezki wrote: > On Fri, Oct 19, 2018 at 05:11:45PM -0700, Joel Fernandes wrote: > > On Fri, Oct 19, 2018 at 07:35:36PM +0200, Uladzislau Rezki (Sony) wrote: > > > Objective > > > --------- > > > Initiative of improving vmalloc allocator comes from getting many issues > > > related to allocation time, i.e. sometimes it is terribly slow. As a result > > > many workloads which are sensitive for long (more than 1 millisecond) preemption > > > off scenario are affected by that slowness(test cases like UI or audio, etc.). > > > > > > The problem is that, currently an allocation of the new VA area is done over > > > busy list iteration until a suitable hole is found between two busy areas. > > > Therefore each new allocation causes the list being grown. Due to long list > > > and different permissive parameters an allocation can take a long time on > > > embedded devices(milliseconds). > > > > I am not super familiar with the vmap allocation code, it has been some > > years. But I have 2 comments: > > > > (1) It seems the issue you are reporting is the walking of the list in > > alloc_vmap_area(). > > > > Can we not solve this by just simplifying the following code? > > > > /* from the starting point, walk areas until a suitable hole is found > > */ > > while (addr + size > first->va_start && addr + size <= vend) { > > if (addr + cached_hole_size < first->va_start) > > cached_hole_size = first->va_start - addr; > > addr = ALIGN(first->va_end, align); > > if (addr + size < addr) > > goto overflow; > > > > if (list_is_last(&first->list, &vmap_area_list)) > > goto found; > > > > first = list_next_entry(first, list); > > } > > > > Instead of going through the vmap_area_list, can we not just binary search > > the existing address-sorted vmap_area_root rbtree to find a hole? If yes, > > that would bring down the linear search overhead. If not, why not? > > > vmap_area_root rb-tree is used for fast access to vmap_area knowing > the address(any va_start). That is why we use the tree. To use that tree > in order to check holes will require to start from the left most node or > specified "vstart" and move forward by rb_next(). What is much slower > than regular(list_next_entry O(1)) access in this case. Ah, sorry. Don't know what I was thinking, you are right. By the way the binder driver does something similar too for buffer allocations, maintains an rb tree of free areas: https://github.com/torvalds/linux/blob/master/drivers/android/binder_alloc.c#L415 > > (2) I am curious, do you have any measurements of how much time > > alloc_vmap_area() is taking? You mentioned it takes milliseconds but I was > > wondering if you had more finer grained function profiling measurements. And > > also any data on how big are the lists at the time you see this issue. > > > Basically it depends on how much or heavily your system uses vmalloc > allocations. I was using CONFIG_DEBUG_PREEMPT with an extra patch. See it > here: ftp://vps418301.ovh.net/incoming/0001-tracing-track-preemption-disable-callers.patch > > As for list size. It can be easily thousands. Understood. I will go through your patches more in the coming days, thanks! - Joel