Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Fri, 18 Aug 2023 14:11:33 +0900

On Fri, Aug 11, 2023 at 3:52 PM Jay Patel <jaypatel@xxxxxxxxxxxxx> wrote:
>
> On Fri, 2023-08-11 at 02:54 +0900, Hyeonggon Yoo wrote:
> > On Thu, Jul 20, 2023 at 7:24 PM Jay Patel <jaypatel@xxxxxxxxxxxxx>
> > wrote:
> > > In the current implementation of the slub memory allocator, the
> > > slab
> > > order selection process follows these criteria:
> > >
> > > 1) Determine the minimum order required to serve the minimum number
> > > of
> > > objects (min_objects). This calculation is based on the formula
> > > (order
> > > = min_objects * object_size / PAGE_SIZE).
> > > 2) If the minimum order is greater than the maximum allowed order
> > > (slub_max_order), set slub_max_order as the order for this slab.
> > > 3) If the minimum order is less than the slub_max_order, iterate
> > > through a loop from minimum order to slub_max_order and check if
> > > the
> > > condition (rem <= slab_size / fract_leftover) holds true. Here,
> > > slab_size is calculated as (PAGE_SIZE << order), rem is (slab_size
> > > %
> > > object_size), and fract_leftover can have values of 16, 8, or 4. If
> > > the condition is true, select that order for the slab.
> > >
> > >
> > > However, in point 3, when calculating the fraction left over, it
> > > can
> > > result in a large range of values (like 1 Kb to 256 bytes on 4K
> > > page
> > > size & 4 Kb to 16 Kb on 64K page size with order 0 and goes on
> > > increasing with higher order) when compared to the remainder (rem).
> > > This
> > > can lead to the selection of an order that results in more memory
> > > wastage. To mitigate such wastage, we have modified point 3 as
> > > follows:
> > > To adjust the value of fract_leftover based on the page size, while
> > > retaining the current value as the default for a 4K page size.
> > >
> > > Test results are as follows:
> > >
> > > 1) On 160 CPUs with 64K Page size
> > >
> > > +-----------------+----------------+----------------+
> > > >          Total wastage in slub memory             |
> > > +-----------------+----------------+----------------+
> > > >                 | After Boot     |After Hackbench |
> > > > Normal          | 932 Kb         | 1812 Kb        |
> > > > With Patch      | 729 Kb         | 1636 Kb        |
> > > > Wastage reduce  | ~22%           | ~10%           |
> > > +-----------------+----------------+----------------+
> > >
> > > +-----------------+----------------+----------------+
> > > >            Total slub memory                      |
> > > +-----------------+----------------+----------------+
> > > >                 | After Boot     | After Hackbench|
> > > > Normal          | 1855296        | 2944576        |
> > > > With Patch      | 1544576        | 2692032        |
> > > > Memory reduce   | ~17%           | ~9%            |
> > > +-----------------+----------------+----------------+
> > >
> > > hackbench-process-sockets
> > > +-------+-----+----------+----------+-----------+
> > > > Amean | 1   | 1.2727   | 1.2450   | ( 2.22%)  |
> > > > Amean | 4   | 1.6063   | 1.5810   | ( 1.60%)  |
> > > > Amean | 7   | 2.4190   | 2.3983   | ( 0.86%)  |
> > > > Amean | 12  | 3.9730   | 3.9347   | ( 0.97%)  |
> > > > Amean | 21  | 6.9823   | 6.8957   | ( 1.26%)  |
> > > > Amean | 30  | 10.1867  | 10.0600  | ( 1.26%)  |
> > > > Amean | 48  | 16.7490  | 16.4853  | ( 1.60%)  |
> > > > Amean | 79  | 28.1870  | 27.8673  | ( 1.15%)  |
> > > > Amean | 110 | 39.8363  | 39.3793  | ( 1.16%) |
> > > > Amean | 141 | 51.5277  | 51.4907  | ( 0.07%)  |
> > > > Amean | 172 | 62.9700  | 62.7300  | ( 0.38%)  |
> > > > Amean | 203 | 74.5037  | 74.0630  | ( 0.59%)  |
> > > > Amean | 234 | 85.6560  | 85.3587  | ( 0.35%)  |
> > > > Amean | 265 | 96.9883  | 96.3770  | ( 0.63%)  |
> > > > Amean | 296 | 108.6893 | 108.0870 | ( 0.56%)  |
> > > +-------+-----+----------+----------+-----------+
> > >
> > > 2) On 16 CPUs with 64K Page size
> > >
> > > +----------------+----------------+----------------+
> > > >          Total wastage in slub memory            |
> > > +----------------+----------------+----------------+
> > > >                | After Boot     | After Hackbench|
> > > > Normal         | 273 Kb         | 544 Kb         |
> > > > With Patch     | 260 Kb         | 500 Kb         |
> > > > Wastage reduce | ~5%            | ~9%            |
> > > +----------------+----------------+----------------+
> > >
> > > +-----------------+----------------+----------------+
> > > >            Total slub memory                      |
> > > +-----------------+----------------+----------------+
> > > >                 | After Boot     | After Hackbench|
> > > > Normal          | 275840          | 412480        |
> > > > With Patch      | 272768          | 406208        |
> > > > Memory reduce   | ~1%             | ~2%           |
> > > +-----------------+----------------+----------------+
> > >
> > > hackbench-process-sockets
> > > +-------+----+---------+---------+-----------+
> > > > Amean | 1  | 0.9513  | 0.9250  | ( 2.77%)  |
> > > > Amean | 4  | 2.9630  | 2.9570  | ( 0.20%)  |
> > > > Amean | 7  | 5.1780  | 5.1763  | ( 0.03%)  |
> > > > Amean | 12 | 8.8833  | 8.8817  | ( 0.02%)  |
> > > > Amean | 21 | 15.7577 | 15.6883 | ( 0.44%)  |
> > > > Amean | 30 | 22.2063 | 22.2843 | ( -0.35%) |
> > > > Amean | 48 | 36.0587 | 36.1390 | ( -0.22%) |
> > > > Amean | 64 | 49.7803 | 49.3457 | ( 0.87%)  |
> > > +-------+----+---------+---------+-----------+
> > >
> > > Signed-off-by: Jay Patel <jaypatel@xxxxxxxxxxxxx>
> > > ---
> > > Changes from V3
> > > 1) Resolved error and optimise logic for all arch
> > >
> > > Changes from V2
> > > 1) removed all page order selection logic for slab cache base on
> > > wastage.
> > > 2) Increasing fraction size base on page size (keeping current
> > > value
> > > as default to 4K page)
> > >
> > > Changes from V1
> > > 1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it
> > > will return with PAGE_ALLOC_COSTLY_ORDER.
> > > 2) Similarly, if min_objects * object_size < PAGE_SIZE, then it
> > > will
> > > return with slub_min_order.
> > > 3) Additionally, I changed slub_max_order to 2. There is no
> > > specific
> > > reason for using the value 2, but it provided the best results in
> > > terms of performance without any noticeable impact.
> > >
> > >  mm/slub.c | 17 +++++++----------
> > >  1 file changed, 7 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index c87628cd8a9a..8f6f38083b94 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -287,6 +287,7 @@ static inline bool
> > > kmem_cache_has_cpu_partial(struct kmem_cache *s)
> > >  #define OO_SHIFT       16
> > >  #define OO_MASK                ((1 << OO_SHIFT) - 1)
> > >  #define MAX_OBJS_PER_PAGE      32767 /* since slab.objects is u15
> > > */
> > > +#define SLUB_PAGE_FRAC_SHIFT 12
> > >
> > >  /* Internal SLUB flags */
> > >  /* Poison object */
> > > @@ -4117,6 +4118,7 @@ static inline int calculate_order(unsigned
> > > int size)
> > >         unsigned int min_objects;
> > >         unsigned int max_objects;
> > >         unsigned int nr_cpus;
> > > +       unsigned int page_size_frac;
> > >
> > >         /*
> > >          * Attempt to find best configuration for a slab. This
> > > @@ -4145,10 +4147,13 @@ static inline int calculate_order(unsigned
> > > int size)
> > >         max_objects = order_objects(slub_max_order, size);
> > >         min_objects = min(min_objects, max_objects);
> > >
> > > -       while (min_objects > 1) {
> > > +       page_size_frac = ((PAGE_SIZE >> SLUB_PAGE_FRAC_SHIFT) == 1)
> > > ? 0
> > > +               : PAGE_SIZE >> SLUB_PAGE_FRAC_SHIFT;
> > > +
> > > +       while (min_objects >= 1) {
> > >                 unsigned int fraction;
> > >
> > > -               fraction = 16;
> > > +               fraction = 16 + page_size_frac;
> > >                 while (fraction >= 4) {
> >
> > Sorry I'm a bit late for the review.
> >
> > IIRC hexagon/powerpc can have ridiculously large page sizes (1M or
> > 256KB)
> > (but I don't know if such config is actually used, tbh) so I think
> > there should be
> > an upper bound.
>
> Hi,
> I think that might not be required as arch with larger page size
> will required larger fraction value as per this exit condition (rem <=
> slab_size / fract_leftover) during calc_slab_order.

Okay, with 256KB pages the fraction will start from 80, and then 40,
20, 10, 5, ...
and 1/80 of 256KB is about 3KB. So it's to waste less even when the
machine uses large page sizes,
because 1/16 of 256KB  is still large, right?

> > >                         order = calc_slab_order(size, min_objects,
> > >                                         slub_max_order, fraction);
> > > @@ -4159,14 +4164,6 @@ static inline int calculate_order(unsigned
> > > int size)
> > >                 min_objects--;
> > >         }
> > > -       /*
> > > -        * We were unable to place multiple objects in a slab. Now
> > > -        * lets see if we can place a single object there.
> > > -        */
> > > -       order = calc_slab_order(size, 1, slub_max_order, 1);
> > > -       if (order <= slub_max_order)
> > > -               return order;
> >
> > I'm not sure if it's okay to remove this?
> > It was fine in v2 because the least wasteful order was chosen
> > regardless of fraction but that's not true anymore.
> >
> Ok, So my though are like if single object in slab with slab_size =
> PAGE_SIZE << slub_max_order and it wastage more then 1\4th of slab_size
> then it's better to skip this part and use MAX_ORDER instead of
> slub_max_order.
> Could you kindly share your perspective on this part?

I simply missed that part! :)
That looks fine to me.

> Tha
> nks
> Jay Patel
> > Otherwise, everything looks fine to me. I'm too dumb to anticipate
> > the outcome of increasing the slab order :P but this patch does not
> > sound crazy to me.
> >
> > Thanks!
> > --
> > Hyeonggon
>