Re: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Thu, 20 Jul 2023 22:46:51 +0900

On Thu, Jul 20, 2023 at 9:59 PM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
> On Thu, Jul 20, 2023 at 12:01 PM Oliver Sang <oliver.sang@xxxxxxxxx> wrote:
> > > > commit:
> > > >   7bc162d5cc ("Merge branches 'slab/for-6.5/prandom', 'slab/for-6.5/slab_no_merge' and 'slab/for-6.5/slab-deprecate' into slab/for-next")
> > > >   a0fd217e6d ("mm/slub: Optimize slub memory usage")
> > > >
> > > > 7bc162d5cc4de5c3 a0fd217e6d6fbd23e91f8796787
> > > > ---------------- ---------------------------
> > > >          %stddev     %change         %stddev
> > > >              \          |                \
> > > >     222503 ą 86%    +108.7%     464342 ą 58%  numa-meminfo.node1.Active
> > > >     222459 ą 86%    +108.7%     464294 ą 58%  numa-meminfo.node1.Active(anon)
> > > >      55573 ą 85%    +108.0%     115619 ą 58%  numa-vmstat.node1.nr_active_anon
> > > >      55573 ą 85%    +108.0%     115618 ą 58%  numa-vmstat.node1.nr_zone_active_anon
> > >
> > > I'm quite baffled while reading this.
> > > How did changing slab order calculation double the number of active anon pages?
> > > I doubt two experiments were performed on the same settings.
> >
> > let me introduce our test process.
> >
> > we make sure the tests upon commit and its parent have exact same environment
> > except the kernel difference, and we also make sure the config to build the
> > commit and its parent are identical.
> >
> > we run tests for one commit at least 6 times to make sure the data is stable.
> >
> > such like for this case, we rebuild the commit and its parent's kernel, the
> > config is attached FYI.

Oh I missed the attachments.
I need more time to look more into that, but could you please test
this patch (attached)?

>    0.00          -100.0%       0.00        numa-numastat.node0.interleave_hit
>   646925 ± 26%     +25.4%     811509 ± 29%  numa-numastat.node0.local_node
>    693386 ± 20%     +30.4%     904091 ± 27%  numa-numastat.node0.numa_hit
>   46461 ± 81%    +102.6%      94126 ± 31%  numa-numastat.node0.other_node
>   0.00          -100.0%       0.00        numa-numastat.node1.interleave_hit
>   1571252 ± 18%     -14.3%    1346549 ± 13%  numa-numastat.node1.local_node
>  1663884 ± 16%     -16.3%    1393406 ± 13%  numa-numastat.node1.numa_hit
>     92593 ± 39%     -49.5%      46769 ± 61%  numa-numastat.node1.other_node

After skimming the attachments - started thinking that it is
undesirable to allocate
high order slabs from remote nodes.
From d688270274febf4115c9c28712d8ff08ca2bee1a Mon Sep 17 00:00:00 2001
From: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
Date: Thu, 20 Jul 2023 22:29:16 +0900
Subject: [PATCH] mm/slub: do not allocate from remote node to allocate high
 order slab

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
---
 mm/slub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index f7940048138c..1f25888d9a41 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2012,7 +2012,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	 */
 	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
 	if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
-		alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_RECLAIM;
+		alloc_gfp = ((alloc_gfp | __GFP_THISNODE | __GFP_NOMEMALLOC) & ~__GFP_RECLAIM);
 
 	slab = alloc_slab_page(alloc_gfp, node, oo);
 	if (unlikely(!slab)) {
-- 
2.41.0