RE: [PATCH v2 3/3] mm/slub: simplify get_partial_node()

"Song, Xiongwei" <Xiongwei.Song@xxxxxxxxxxxxx> · Sun, 7 Apr 2024 01:47:40 +0000

> On 4/4/24 7:58 AM, xiongwei.song@xxxxxxxxxxxxx wrote:
> > From: Xiongwei Song <xiongwei.song@xxxxxxxxxxxxx>
> >
> > The break conditions for filling cpu partial can be more readable and
> > simple.
> >
> > If slub_get_cpu_partial() returns 0, we can confirm that we don't need
> > to fill cpu partial, then we should break from the loop. On the other
> > hand, we also should break from the loop if we have added enough cpu
> > partial slabs.
> >
> > Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird
> > corner case that if we set cpu_partial_slabs to 0 from sysfs, we still
> > allocate at least one here.
> >
> > Signed-off-by: Xiongwei Song <xiongwei.song@xxxxxxxxxxxxx>
> > ---
> >
> > The measurement below is to compare the performance effects when
> checking
> > if we need to break from the filling cpu partial loop with the following
> > either-or condition:
> >
> > Condition 1:
> > When the count of added cpu slabs is greater than cpu_partial_slabs/2:
> > (partial_slabs > slub_get_cpu_partial(s) / 2)
> >
> > Condition 2:
> > When the count of added cpu slabs is greater than or equal to
> > cpu_partial_slabs/2:
> > (partial_slabs >= slub_get_cpu_partial(s) / 2)
> >
> > The change of breaking condition can effect how many cpu partial slabs
> > would be put on the cpu partial list.
> >
> > Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with
> > 16 cores. The OS is Ubuntu 22.04.
> >
> > hackbench-process-pipes
> >                   6.9-rc2(with ">")      6.9.0-rc2(with ">=")
> > Amean     1       0.0373 (   0.00%)      0.0356 *   4.60%*
> > Amean     4       0.0984 (   0.00%)      0.1014 *  -3.05%*
> > Amean     7       0.1803 (   0.00%)      0.1851 *  -2.69%*
> > Amean     12      0.2947 (   0.00%)      0.3141 *  -6.59%*
> > Amean     21      0.4577 (   0.00%)      0.4927 *  -7.65%*
> > Amean     30      0.6326 (   0.00%)      0.6649 *  -5.10%*
> > Amean     48      0.9396 (   0.00%)      0.9884 *  -5.20%*
> > Amean     64      1.2321 (   0.00%)      1.3004 *  -5.54%*
> >
> > hackbench-process-sockets
> >                   6.9-rc2(with ">")      6.9.0-rc2(with ">=")
> > Amean     1       0.0609 (   0.00%)      0.0623 *  -2.35%*
> > Amean     4       0.2107 (   0.00%)      0.2140 *  -1.56%*
> > Amean     7       0.3754 (   0.00%)      0.3966 *  -5.63%*
> > Amean     12      0.6456 (   0.00%)      0.6734 *  -4.32%*
> > Amean     21      1.1440 (   0.00%)      1.1769 *  -2.87%*
> > Amean     30      1.6629 (   0.00%)      1.7031 *  -2.42%*
> > Amean     48      2.7321 (   0.00%)      2.7897 *  -2.11%*
> > Amean     64      3.7397 (   0.00%)      3.7640 *  -0.65%*
> >
> > It seems there is a bit performance penalty when using ">=" to break up
> > the loop. Hence, we should still use ">" here.
> 
> Thanks for evaluating that, I suspected that would be the case so we should
> not change that performance aspect as part of a cleanup.
> 
> > ---
> >  mm/slub.c | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 590cc953895d..6beff3b1e22c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2619,13 +2619,10 @@ static struct slab *get_partial_node(struct
> kmem_cache *s,
> >                       stat(s, CPU_PARTIAL_NODE);
> >                       partial_slabs++;
> >               }
> > -#ifdef CONFIG_SLUB_CPU_PARTIAL
> > -             if (partial_slabs > s->cpu_partial_slabs / 2)
> > -                     break;
> > -#else
> > -             break;
> > -#endif
> >
> > +             if ((slub_get_cpu_partial(s) == 0) ||
> > +                 (partial_slabs > slub_get_cpu_partial(s) / 2))
> > +                     break;
> >       }
> >       spin_unlock_irqrestore(&n->list_lock, flags);
> >       return partial;
> 
> After looking at the result and your v1 again, I arrived at this
> modification that incorporates the core v1 idea without reintroducing
> kmem_cache_has_cpu_partial(). The modified patch looks like below. Is it OK
> with you? Pushed the whole series with this modification to slab/for-next
> for now.

Sorry for the late response, I was on vacation.

I'm ok with the patch below.

Thanks,
Xiongwei

> 
> ----8<-----
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2614,18 +2614,17 @@ static struct slab *get_partial_node(struct
> kmem_cache *s,
>                 if (!partial) {
>                         partial = slab;
>                         stat(s, ALLOC_FROM_PARTIAL);
> +                       if ((slub_get_cpu_partial(s) == 0)) {
> +                               break;
> +                       }
>                 } else {
>                         put_cpu_partial(s, slab, 0);
>                         stat(s, CPU_PARTIAL_NODE);
> -                       partial_slabs++;
> -               }
> -#ifdef CONFIG_SLUB_CPU_PARTIAL
> -               if (partial_slabs > s->cpu_partial_slabs / 2)
> -                       break;
> -#else
> -               break;
> -#endif
> 
> +                       if (++partial_slabs > slub_get_cpu_partial(s) / 2) {
> +                               break;
> +                       }
> +               }
>         }
>         spin_unlock_irqrestore(&n->list_lock, flags);
>         return partial;