Re: [PATCH] mm: vmalloc: Use the vmap_area_lock to protect ne_fit_preload_node

Uladzislau Rezki <urezki@xxxxxxxxx> · Mon, 7 Oct 2019 23:44:20 +0200

On Mon, Oct 07, 2019 at 07:36:44PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote:
> > Actually there is a high lock contention on vmap_area_lock, because it
> > is still global. You can have a look at last slide:
> > 
> > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > 
> > so this change will make it a bit higher. From the other hand i agree
> > that for rt it should be fixed, probably it could be done like:
> > 
> > ifdef PREEMPT_RT
> >     migrate_disable()
> > #else
> >     preempt_disable()
> > ...
> > 
> > but i am not sure it is good either.
> 
> What is to be expected on average? Is the lock acquired and then
> released again because the slot is empty and memory needs to be
> allocated or can it be assumed that this hardly happens? 
> 
The lock is not released(we are not allowed), instead we just try
to allocate with GFP_NOWAIT flag. It can happen if preallocation
has been failed with GFP_KERNEL flag earlier:

<snip>
...
 } else if (type == NE_FIT_TYPE) {
  /*
   * Split no edge of fit VA.
   *
   *     |       |
   *   L V  NVA  V R
   * |---|-------|---|
   */
  lva = __this_cpu_xchg(ne_fit_preload_node, NULL);
  if (unlikely(!lva)) {
      ...
      lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
      ...
  }
...
<snip>

How often we need an extra object for split purpose, the answer
is it depends on. For example fork() path falls to that pattern.

I think we can assume that migration can hardly ever happen and
that should be considered as rare case. Thus we can do a prealoading
without worrying much if a it occurs:

<snip>
urezki@pc636:~/data/ssd/coding/linux-stable$ git diff

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e92ff5f7dd8b..bc782edcd1fd 100644 
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1089,20 +1089,16 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
         * Even if it fails we do not really care about that. Just proceed
         * as it is. "overflow" path will refill the cache we allocate from.
         */
-       preempt_disable();
-       if (!__this_cpu_read(ne_fit_preload_node)) {
-               preempt_enable();
+       if (!this_cpu_read(ne_fit_preload_node)) {
                pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
-               preempt_disable();

-               if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
+               if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
                        if (pva)
                                kmem_cache_free(vmap_area_cachep, pva);
                }
        }
 
        spin_lock(&vmap_area_lock);
-       preempt_enable();

        /*
         * If an allocation fails, the "vend" address is
urezki@pc636:~/data/ssd/coding/linux-stable$
<snip>

so, we do not guarantee, instead we minimize number of allocations
with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to
even trigger the case when CPU is not preloaded.

I can test it tomorrow on my 12xCPUs to see its behavior there.

--
Vlad Rezki