On 11/19/24 3:59 AM, Wei Yang wrote:
On Thu, Nov 14, 2024 at 04:39:00PM -0500, Sid Kumar wrote:
On 11/14/24 12:05 PM, Sidhartha Kumar wrote:
[...]
================ results =========================
Bpftrace was used to profile the allocation path for requesting new maple
nodes while running the ./mmap1_processes test from mmtests. The two paths
for allocation are requests for a single node and the bulk allocation path.
The histogram represents the number of calls to these paths and a shows the
distribution of the number of nodes requested for the bulk allocation path.
mm-unstable 11/13/24
@bulk_alloc_req:
[2, 4) 10 |@@@@@@@@@@@@@ |
[4, 8) 38 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
mm-unstable 11/13/24 + this series
@bulk_alloc_req:
[2, 4) 9 |@@@@@@@@@@ |
[4, 8) 43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 15 |@@@@@@@@@@@@@@@@@@ |
We can see the worst case bulk allocations of [8,16) nodes are reduced after
this series.
From running the ./malloc1_threads test case we eliminate almost all bulk
allocation requests that
fall between 8 and 16 nodes
./malloc1_threads -t 8 -s 100
mm-unstable + this series
@bulk_alloc_req:
[2, 4) 2 |
|
[4, 8) 3381
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16) 2 |
|
This is impressive. But I come up one thing not clear.
For mmap related code, we usually have the following usage:
vma_iter_prealloc(vmi, vma);
mas_preallocate(vmi->mas, vma);
MA_WR_STATE(wr_mas, );
mas_wr_store_type(&wr_mas); --- (1)
vma_iter_store(vmi, vma);
Locaton (1) is where we try to get a better estimation of allocations.
The estimation is based on we walk down the tree and try to meet a proper
node.
In mmap related code, we usually have already walked down the
tree to leaf, by vma_find() or related iteration operation, and the mas.status
is set to ma_active. To me, I don't expect mas_preallocate() would traverse
the tree again.
But from your result, it seems most cases do traverse the tree again to get a
more precise height.
Hello,
From looking at mas_wr_prealloc_setup(), when mas_is_active():
we reset in two scenarios:
if (mas->last > mas->max)
goto reset;
if (wr_mas->entry)
goto set_content;
if (mte_is_leaf(mas->node) && mas->last == mas->max)
goto reset;
it could be that this test case specifically hits these two cases. In
testing brk() I did not see the same gains that this malloc test had so
in that case we are probably not traversing the tree again as you say.
Thanks,
Sid
Which part do you think I have missed?
mm-unstable
@bulk_alloc_req:
[2, 4) 1 |
|
[4, 8) 1427 |@@@@@@@@@@@@@@@@@@@@@@@@@@
|
[8, 16) 2790
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
Sidhartha Kumar (5):
maple_tree: convert mas_prealloc_calc() to take in a maple write state
maple_tree: use height and depth consistently
maple_tree: use vacant nodes to reduce worst case allocations
maple_tree: break on convergence in mas_spanning_rebalance()
maple_tree: add sufficient height
include/linux/maple_tree.h | 4 +
lib/maple_tree.c | 89 +++++++++++++---------
tools/testing/radix-tree/maple.c | 125 +++++++++++++++++++++++++++++--
3 files changed, 176 insertions(+), 42 deletions(-)