On Mon, Oct 07, 2019 at 07:36:44PM +0200, Sebastian Andrzej Siewior wrote: > On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote: > > Actually there is a high lock contention on vmap_area_lock, because it > > is still global. You can have a look at last slide: > > > > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > > > so this change will make it a bit higher. From the other hand i agree > > that for rt it should be fixed, probably it could be done like: > > > > ifdef PREEMPT_RT > > migrate_disable() > > #else > > preempt_disable() > > ... > > > > but i am not sure it is good either. > > What is to be expected on average? Is the lock acquired and then > released again because the slot is empty and memory needs to be > allocated or can it be assumed that this hardly happens? > The lock is not released(we are not allowed), instead we just try to allocate with GFP_NOWAIT flag. It can happen if preallocation has been failed with GFP_KERNEL flag earlier: <snip> ... } else if (type == NE_FIT_TYPE) { /* * Split no edge of fit VA. * * | | * L V NVA V R * |---|-------|---| */ lva = __this_cpu_xchg(ne_fit_preload_node, NULL); if (unlikely(!lva)) { ... lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT); ... } ... <snip> How often we need an extra object for split purpose, the answer is it depends on. For example fork() path falls to that pattern. I think we can assume that migration can hardly ever happen and that should be considered as rare case. Thus we can do a prealoading without worrying much if a it occurs: <snip> urezki@pc636:~/data/ssd/coding/linux-stable$ git diff diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e92ff5f7dd8b..bc782edcd1fd 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1089,20 +1089,16 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, * Even if it fails we do not really care about that. Just proceed * as it is. "overflow" path will refill the cache we allocate from. */ - preempt_disable(); - if (!__this_cpu_read(ne_fit_preload_node)) { - preempt_enable(); + if (!this_cpu_read(ne_fit_preload_node)) { pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); - preempt_disable(); - if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { + if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { if (pva) kmem_cache_free(vmap_area_cachep, pva); } } spin_lock(&vmap_area_lock); - preempt_enable(); /* * If an allocation fails, the "vend" address is urezki@pc636:~/data/ssd/coding/linux-stable$ <snip> so, we do not guarantee, instead we minimize number of allocations with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to even trigger the case when CPU is not preloaded. I can test it tomorrow on my 12xCPUs to see its behavior there. -- Vlad Rezki