Hi Liam, On 6/3/2023 2:55 AM, Liam R. Howlett wrote: > * Yin, Fengwei <fengwei.yin@xxxxxxxxx> [230602 04:11]: >> Hi Liam, >> >> On 6/1/2023 10:15 AM, Liam R. Howlett wrote: >>> Initial work on preallocations showed no regression in performance >>> during testing, but recently some users (both on [1] and off [android] >>> list) have reported that preallocating the worst-case number of nodes >>> has caused some slow down. This patch set addresses the number of >>> allocations in a few ways. >>> >>> During munmap() most munmap() operations will remove a single VMA, so >>> leverage the fact that the maple tree can place a single pointer at >>> range 0 - 0 without allocating. This is done by changing the index in >>> the 'sidetree'. >>> >>> Re-introduce the entry argument to mas_preallocate() so that a more >>> intelligent guess of the node count can be made. >>> >>> Patches are in the following order: >>> 0001-0002: Testing framework for benchmarking some operations >>> 0003-0004: Reduction of maple node allocation in sidetree >>> 0005: Small cleanup of do_vmi_align_munmap() >>> 0006-0013: mas_preallocate() calculation change >>> 0014: Change the vma iterator order >> I did run The AIM:page_test on an IceLake 48C/96T + 192G RAM platform with >> this patchset. >> >> The result has a little bit improvement: >> Base (next-20230602): >> 503880 >> Base with this patchset: >> 519501 >> >> But they are far from the none-regression result (commit 7be1c1a3c7b1): >> 718080 >> >> >> Some other information I collected: >> With Base, the mas_alloc_nodes are always hit with request: 7. >> With this patchset, the request are 1 or 5. >> >> I suppose this is the reason for improvement from 503880 to 519501. >> >> With commit 7be1c1a3c7b1, mas_store_gfp() in do_brk_flags never triggered >> mas_alloc_nodes() call. Thanks. > > Thanks for retesting. I've not been able to see the regression myself. > Are you running in a VM of sorts? Android and some cloud VMs seem to I didn't run it in VM. I run it on a native env. > see this, but I do not in kvm or the server I test on. > > I am still looking to reduce/reverse the regression and a reproducer on > my end would help. The test is page_test of AIM9. You could get AIM9 test suite from: http://nchc.dl.sourceforge.net/project/aimbench/aim-suite9 After build it, we could see app singleuser. It needs a txt file named s9workfile to define the test case. The s9workfile I am using has following content: # @(#) s9workfile:1.2 1/22/96 00:00:00 # AIM Independent Resource Benchmark - Suite IX Workfile FILESIZE: 5M page_test Then you can run the testing by command: ./singleuser -nl It will ask some configuration questions and then run the real test. One thing need be taken care is that the create-clo.c has one line: newbrk = sbrk(-4096 * 16); It should be updated as: intptr_t inc = -4096 * 16; newbrk = sbrk(inc); Otherwise, the -4096 * 16 will be treated as 32 bit and the line is changed to extend brk to around 4G. If we don't have enough RAM, the set_brk syscall will fail. If you met any issue to run the test, just ping me. Thanks. Regards Yin, Fengwei > >> >> >> Regards >> Yin, Fengwei >> >>> >>> [1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@xxxxxxxxx/ >>> >>> Liam R. Howlett (14): >>> maple_tree: Add benchmarking for mas_for_each >>> maple_tree: Add benchmarking for mas_prev() >>> mm: Move unmap_vmas() declaration to internal header >>> mm: Change do_vmi_align_munmap() side tree index >>> mm: Remove prev check from do_vmi_align_munmap() >>> maple_tree: Introduce __mas_set_range() >>> mm: Remove re-walk from mmap_region() >>> maple_tree: Re-introduce entry to mas_preallocate() arguments >>> mm: Use vma_iter_clear_gfp() in nommu >>> mm: Set up vma iterator for vma_iter_prealloc() calls >>> maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null() >>> maple_tree: Update mas_preallocate() testing >>> maple_tree: Refine mas_preallocate() node calculations >>> mm/mmap: Change vma iteration order in do_vmi_align_munmap() >>> >>> fs/exec.c | 1 + >>> include/linux/maple_tree.h | 23 ++++- >>> include/linux/mm.h | 4 - >>> lib/maple_tree.c | 78 ++++++++++---- >>> lib/test_maple_tree.c | 74 +++++++++++++ >>> mm/internal.h | 40 ++++++-- >>> mm/memory.c | 16 ++- >>> mm/mmap.c | 171 ++++++++++++++++--------------- >>> mm/nommu.c | 45 ++++---- >>> tools/testing/radix-tree/maple.c | 59 ++++++----- >>> 10 files changed, 331 insertions(+), 180 deletions(-) >>>