* Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> [230605 03:59]: > > > 在 2023/6/5 14:18, Yin, Fengwei 写道: > > > > > > On 6/5/2023 12:41 PM, Yin Fengwei wrote: > > > Hi Peng, > > > > > > On 6/5/23 11:28, Peng Zhang wrote: > > > > > > > > > > > > 在 2023/6/2 16:10, Yin, Fengwei 写道: > > > > > Hi Liam, > > > > > > > > > > On 6/1/2023 10:15 AM, Liam R. Howlett wrote: > > > > > > Initial work on preallocations showed no regression in performance > > > > > > during testing, but recently some users (both on [1] and off [android] > > > > > > list) have reported that preallocating the worst-case number of nodes > > > > > > has caused some slow down. This patch set addresses the number of > > > > > > allocations in a few ways. > > > > > > > > > > > > During munmap() most munmap() operations will remove a single VMA, so > > > > > > leverage the fact that the maple tree can place a single pointer at > > > > > > range 0 - 0 without allocating. This is done by changing the index in > > > > > > the 'sidetree'. > > > > > > > > > > > > Re-introduce the entry argument to mas_preallocate() so that a more > > > > > > intelligent guess of the node count can be made. > > > > > > > > > > > > Patches are in the following order: > > > > > > 0001-0002: Testing framework for benchmarking some operations > > > > > > 0003-0004: Reduction of maple node allocation in sidetree > > > > > > 0005: Small cleanup of do_vmi_align_munmap() > > > > > > 0006-0013: mas_preallocate() calculation change > > > > > > 0014: Change the vma iterator order > > > > > I did run The AIM:page_test on an IceLake 48C/96T + 192G RAM platform with > > > > > this patchset. > > > > > > > > > > The result has a little bit improvement: > > > > > Base (next-20230602): > > > > > 503880 > > > > > Base with this patchset: > > > > > 519501 > > > > > > > > > > But they are far from the none-regression result (commit 7be1c1a3c7b1): > > > > > 718080 > > > > > > > > > > > > > > > Some other information I collected: > > > > > With Base, the mas_alloc_nodes are always hit with request: 7. > > > > > With this patchset, the request are 1 or 5. > > > > > > > > > > I suppose this is the reason for improvement from 503880 to 519501. > > > > > > > > > > With commit 7be1c1a3c7b1, mas_store_gfp() in do_brk_flags never triggered > > > > > mas_alloc_nodes() call. Thanks. > > > > Hi Fengwei, > > > > > > > > I think it may be related to the inaccurate number of nodes allocated > > > > in the pre-allocation. I slightly modified the pre-allocation in this > > > > patchset, but I don't know if it works. It would be great if you could > > > > help test it, and help pinpoint the cause. Below is the diff, which can > > > > be applied based on this pachset. > > > I tried the patch, it could eliminate the call of mas_alloc_nodes() during > > > the test. But the result of benchmark got a little bit improvement: > > > 529040 > > > > > > But it's still much less than none-regression result. I will also double > > > confirm the none-regression result. > > Just noticed that the commit f5715584af95 make validate_mm() two implementation > > based on CONFIG_DEBUG_VM instead of CONFIG_DEBUG_VM_MAPPLE_TREE). I have > > CONFIG_DEBUG_VM but not CONFIG_DEBUG_VM_MAPPLE_TREE defined. So it's not an > > apple to apple. You mean "mm: update validate_mm() to use vma iterator" here I guess. I have it as a different commit id in my branch. I 'restored' some of the checking because I was able to work around not having the mt_dump() definition with the vma iterator. I'm now wondering how wide spread CONFIG_DEBUG_VM is used and if I should not have added these extra checks. > > > > > > I disable CONFIG_DEBUG_VM and re-run the test and got: > > Before preallocation change (7be1c1a3c7b1): > > 770100 > > After preallocation change (28c5609fb236): > > 680000 > > With liam's fix: > > 702100 > > plus Peng's fix: > > 725900 > Thank you for your test, now it seems that the performance > regression is not so much. We are also too strict on the reset during mas_store_prealloc() checking for a spanning write. I have a fix for this for v2 of the patch set, although I suspect it will not make a huge difference. > > > > > > Regards > > Yin, Fengwei > > > > > > > > > > > Regards > > > Yin, Fengwei > > > > > > > > > > > Thanks, > > > > Peng > > > > > > > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > > > > index 5ea211c3f186..e67bf2744384 100644 > > > > --- a/lib/maple_tree.c > > > > +++ b/lib/maple_tree.c > > > > @@ -5575,9 +5575,11 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp) > > > > goto ask_now; > > > > } > > > > > > > > - /* New root needs a singe node */ > > > > - if (unlikely(mte_is_root(mas->node))) > > > > - goto ask_now; Why did you drop this? If we are creating a new root we will only need one node. > > > > + if ((node_size == wr_mas.node_end + 1 && > > > > + mas->offset == wr_mas.node_end) || > > > > + (node_size == wr_mas.node_end && > > > > + wr_mas.offset_end - mas->offset == 1)) > > > > + return 0; I will add this to v2 as well, or something similar. > > > > > > > > /* Potential spanning rebalance collapsing a node, use worst-case */ > > > > if (node_size - 1 <= mt_min_slots[wr_mas.type]) > > > > @@ -5590,7 +5592,6 @@ int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp) > > > > if (likely(!mas_is_err(mas))) > > > > return 0; > > > > > > > > - mas_set_alloc_req(mas, 0); Why did you drop this? It seems like a worth while cleanup on failure. > > > > ret = xa_err(mas->node); > > > > mas_reset(mas); > > > > mas_destroy(mas); > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > Yin, Fengwei > > > > > > > > > > > > > > > > > [1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@xxxxxxxxx/ > > > > > > > > > > > > Liam R. Howlett (14): > > > > > > maple_tree: Add benchmarking for mas_for_each > > > > > > maple_tree: Add benchmarking for mas_prev() > > > > > > mm: Move unmap_vmas() declaration to internal header > > > > > > mm: Change do_vmi_align_munmap() side tree index > > > > > > mm: Remove prev check from do_vmi_align_munmap() > > > > > > maple_tree: Introduce __mas_set_range() > > > > > > mm: Remove re-walk from mmap_region() > > > > > > maple_tree: Re-introduce entry to mas_preallocate() arguments > > > > > > mm: Use vma_iter_clear_gfp() in nommu > > > > > > mm: Set up vma iterator for vma_iter_prealloc() calls > > > > > > maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null() > > > > > > maple_tree: Update mas_preallocate() testing > > > > > > maple_tree: Refine mas_preallocate() node calculations > > > > > > mm/mmap: Change vma iteration order in do_vmi_align_munmap() > > > > > > > > > > > > fs/exec.c | 1 + > > > > > > include/linux/maple_tree.h | 23 ++++- > > > > > > include/linux/mm.h | 4 - > > > > > > lib/maple_tree.c | 78 ++++++++++---- > > > > > > lib/test_maple_tree.c | 74 +++++++++++++ > > > > > > mm/internal.h | 40 ++++++-- > > > > > > mm/memory.c | 16 ++- > > > > > > mm/mmap.c | 171 ++++++++++++++++--------------- > > > > > > mm/nommu.c | 45 ++++---- > > > > > > tools/testing/radix-tree/maple.c | 59 ++++++----- > > > > > > 10 files changed, 331 insertions(+), 180 deletions(-) > > > > > >