Re: [PATCH 00/14] Reduce preallocations for maple tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2023/6/5 14:18, Yin, Fengwei 写道:


On 6/5/2023 12:41 PM, Yin Fengwei wrote:
Hi Peng,

On 6/5/23 11:28, Peng Zhang wrote:


在 2023/6/2 16:10, Yin, Fengwei 写道:
Hi Liam,

On 6/1/2023 10:15 AM, Liam R. Howlett wrote:
Initial work on preallocations showed no regression in performance
during testing, but recently some users (both on [1] and off [android]
list) have reported that preallocating the worst-case number of nodes
has caused some slow down.  This patch set addresses the number of
allocations in a few ways.

During munmap() most munmap() operations will remove a single VMA, so
leverage the fact that the maple tree can place a single pointer at
range 0 - 0 without allocating.  This is done by changing the index in
the 'sidetree'.

Re-introduce the entry argument to mas_preallocate() so that a more
intelligent guess of the node count can be made.

Patches are in the following order:
0001-0002: Testing framework for benchmarking some operations
0003-0004: Reduction of maple node allocation in sidetree
0005:      Small cleanup of do_vmi_align_munmap()
0006-0013: mas_preallocate() calculation change
0014:      Change the vma iterator order
I did run The AIM:page_test on an IceLake 48C/96T + 192G RAM platform with
this patchset.

The result has a little bit improvement:
Base (next-20230602):
    503880
Base with this patchset:
    519501

But they are far from the none-regression result (commit 7be1c1a3c7b1):
    718080


Some other information I collected:
With Base, the mas_alloc_nodes are always hit with request: 7.
With this patchset, the request are 1 or 5.

I suppose this is the reason for improvement from 503880 to 519501.

With commit 7be1c1a3c7b1, mas_store_gfp() in do_brk_flags never triggered
mas_alloc_nodes() call. Thanks.
Hi Fengwei,

I think it may be related to the inaccurate number of nodes allocated
in the pre-allocation. I slightly modified the pre-allocation in this
patchset, but I don't know if it works. It would be great if you could
help test it, and help pinpoint the cause. Below is the diff, which can
be applied based on this pachset.
I tried the patch, it could eliminate the call of mas_alloc_nodes() during
the test. But the result of benchmark got a little bit improvement:
   529040

But it's still much less than none-regression result. I will also double
confirm the none-regression result.
Just noticed that the commit f5715584af95 make validate_mm() two implementation
based on CONFIG_DEBUG_VM instead of CONFIG_DEBUG_VM_MAPPLE_TREE). I have
CONFIG_DEBUG_VM but not CONFIG_DEBUG_VM_MAPPLE_TREE defined. So it's not an
apple to apple.


I disable CONFIG_DEBUG_VM and re-run the test and got:
Before preallocation change (7be1c1a3c7b1):
     770100
After preallocation change (28c5609fb236):
     680000
With liam's fix:
     702100
plus Peng's fix:
     725900
Thank you for your test, now it seems that the performance
regression is not so much.


Regards
Yin, Fengwei



Regards
Yin, Fengwei


Thanks,
Peng

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 5ea211c3f186..e67bf2744384 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5575,9 +5575,11 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
          goto ask_now;
      }

-    /* New root needs a singe node */
-    if (unlikely(mte_is_root(mas->node)))
-        goto ask_now;
+    if ((node_size == wr_mas.node_end + 1 &&
+         mas->offset == wr_mas.node_end) ||
+        (node_size == wr_mas.node_end &&
+         wr_mas.offset_end - mas->offset == 1))
+        return 0;

      /* Potential spanning rebalance collapsing a node, use worst-case */
      if (node_size  - 1 <= mt_min_slots[wr_mas.type])
@@ -5590,7 +5592,6 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
      if (likely(!mas_is_err(mas)))
          return 0;

-    mas_set_alloc_req(mas, 0);
      ret = xa_err(mas->node);
      mas_reset(mas);
      mas_destroy(mas);




Regards
Yin, Fengwei


[1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@xxxxxxxxx/

Liam R. Howlett (14):
    maple_tree: Add benchmarking for mas_for_each
    maple_tree: Add benchmarking for mas_prev()
    mm: Move unmap_vmas() declaration to internal header
    mm: Change do_vmi_align_munmap() side tree index
    mm: Remove prev check from do_vmi_align_munmap()
    maple_tree: Introduce __mas_set_range()
    mm: Remove re-walk from mmap_region()
    maple_tree: Re-introduce entry to mas_preallocate() arguments
    mm: Use vma_iter_clear_gfp() in nommu
    mm: Set up vma iterator for vma_iter_prealloc() calls
    maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null()
    maple_tree: Update mas_preallocate() testing
    maple_tree: Refine mas_preallocate() node calculations
    mm/mmap: Change vma iteration order in do_vmi_align_munmap()

   fs/exec.c                        |   1 +
   include/linux/maple_tree.h       |  23 ++++-
   include/linux/mm.h               |   4 -
   lib/maple_tree.c                 |  78 ++++++++++----
   lib/test_maple_tree.c            |  74 +++++++++++++
   mm/internal.h                    |  40 ++++++--
   mm/memory.c                      |  16 ++-
   mm/mmap.c                        | 171 ++++++++++++++++---------------
   mm/nommu.c                       |  45 ++++----
   tools/testing/radix-tree/maple.c |  59 ++++++-----
   10 files changed, 331 insertions(+), 180 deletions(-)





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux