On 6/5/2023 12:41 PM, Yin Fengwei wrote: > Hi Peng, > > On 6/5/23 11:28, Peng Zhang wrote: >> >> >> 在 2023/6/2 16:10, Yin, Fengwei 写道: >>> Hi Liam, >>> >>> On 6/1/2023 10:15 AM, Liam R. Howlett wrote: >>>> Initial work on preallocations showed no regression in performance >>>> during testing, but recently some users (both on [1] and off [android] >>>> list) have reported that preallocating the worst-case number of nodes >>>> has caused some slow down. This patch set addresses the number of >>>> allocations in a few ways. >>>> >>>> During munmap() most munmap() operations will remove a single VMA, so >>>> leverage the fact that the maple tree can place a single pointer at >>>> range 0 - 0 without allocating. This is done by changing the index in >>>> the 'sidetree'. >>>> >>>> Re-introduce the entry argument to mas_preallocate() so that a more >>>> intelligent guess of the node count can be made. >>>> >>>> Patches are in the following order: >>>> 0001-0002: Testing framework for benchmarking some operations >>>> 0003-0004: Reduction of maple node allocation in sidetree >>>> 0005: Small cleanup of do_vmi_align_munmap() >>>> 0006-0013: mas_preallocate() calculation change >>>> 0014: Change the vma iterator order >>> I did run The AIM:page_test on an IceLake 48C/96T + 192G RAM platform with >>> this patchset. >>> >>> The result has a little bit improvement: >>> Base (next-20230602): >>> 503880 >>> Base with this patchset: >>> 519501 >>> >>> But they are far from the none-regression result (commit 7be1c1a3c7b1): >>> 718080 >>> >>> >>> Some other information I collected: >>> With Base, the mas_alloc_nodes are always hit with request: 7. >>> With this patchset, the request are 1 or 5. >>> >>> I suppose this is the reason for improvement from 503880 to 519501. >>> >>> With commit 7be1c1a3c7b1, mas_store_gfp() in do_brk_flags never triggered >>> mas_alloc_nodes() call. Thanks. >> Hi Fengwei, >> >> I think it may be related to the inaccurate number of nodes allocated >> in the pre-allocation. I slightly modified the pre-allocation in this >> patchset, but I don't know if it works. It would be great if you could >> help test it, and help pinpoint the cause. Below is the diff, which can >> be applied based on this pachset. > I tried the patch, it could eliminate the call of mas_alloc_nodes() during > the test. But the result of benchmark got a little bit improvement: > 529040 > > But it's still much less than none-regression result. I will also double > confirm the none-regression result. Just noticed that the commit f5715584af95 make validate_mm() two implementation based on CONFIG_DEBUG_VM instead of CONFIG_DEBUG_VM_MAPPLE_TREE). I have CONFIG_DEBUG_VM but not CONFIG_DEBUG_VM_MAPPLE_TREE defined. So it's not an apple to apple. I disable CONFIG_DEBUG_VM and re-run the test and got: Before preallocation change (7be1c1a3c7b1): 770100 After preallocation change (28c5609fb236): 680000 With liam's fix: 702100 plus Peng's fix: 725900 Regards Yin, Fengwei > > > Regards > Yin, Fengwei > >> >> Thanks, >> Peng >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index 5ea211c3f186..e67bf2744384 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -5575,9 +5575,11 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp) >> goto ask_now; >> } >> >> - /* New root needs a singe node */ >> - if (unlikely(mte_is_root(mas->node))) >> - goto ask_now; >> + if ((node_size == wr_mas.node_end + 1 && >> + mas->offset == wr_mas.node_end) || >> + (node_size == wr_mas.node_end && >> + wr_mas.offset_end - mas->offset == 1)) >> + return 0; >> >> /* Potential spanning rebalance collapsing a node, use worst-case */ >> if (node_size - 1 <= mt_min_slots[wr_mas.type]) >> @@ -5590,7 +5592,6 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp) >> if (likely(!mas_is_err(mas))) >> return 0; >> >> - mas_set_alloc_req(mas, 0); >> ret = xa_err(mas->node); >> mas_reset(mas); >> mas_destroy(mas); >> >> >>> >>> >>> Regards >>> Yin, Fengwei >>> >>>> >>>> [1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@xxxxxxxxx/ >>>> >>>> Liam R. Howlett (14): >>>> maple_tree: Add benchmarking for mas_for_each >>>> maple_tree: Add benchmarking for mas_prev() >>>> mm: Move unmap_vmas() declaration to internal header >>>> mm: Change do_vmi_align_munmap() side tree index >>>> mm: Remove prev check from do_vmi_align_munmap() >>>> maple_tree: Introduce __mas_set_range() >>>> mm: Remove re-walk from mmap_region() >>>> maple_tree: Re-introduce entry to mas_preallocate() arguments >>>> mm: Use vma_iter_clear_gfp() in nommu >>>> mm: Set up vma iterator for vma_iter_prealloc() calls >>>> maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null() >>>> maple_tree: Update mas_preallocate() testing >>>> maple_tree: Refine mas_preallocate() node calculations >>>> mm/mmap: Change vma iteration order in do_vmi_align_munmap() >>>> >>>> fs/exec.c | 1 + >>>> include/linux/maple_tree.h | 23 ++++- >>>> include/linux/mm.h | 4 - >>>> lib/maple_tree.c | 78 ++++++++++---- >>>> lib/test_maple_tree.c | 74 +++++++++++++ >>>> mm/internal.h | 40 ++++++-- >>>> mm/memory.c | 16 ++- >>>> mm/mmap.c | 171 ++++++++++++++++--------------- >>>> mm/nommu.c | 45 ++++---- >>>> tools/testing/radix-tree/maple.c | 59 ++++++----- >>>> 10 files changed, 331 insertions(+), 180 deletions(-) >>>>