On Wed, 2023-07-12 at 15:06 +0200, Vlastimil Babka wrote: > On 6/28/23 11:57, Jay Patel wrote: > > In the previous version [1], we were able to reduce slub memory > > wastage, but the total memory was also increasing so to solve > > this problem have modified the patch as follow: > > > > 1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it > > will return with PAGE_ALLOC_COSTLY_ORDER. > > 2) Similarly, if min_objects * object_size < PAGE_SIZE, then it > > will > > return with slub_min_order. > > 3) Additionally, I changed slub_max_order to 2. There is no > > specific > > reason for using the value 2, but it provided the best results in > > terms of performance without any noticeable impact. > > > > [1] > > > > Hi, > > thanks for the v2. A process note: the changelog should be self- > contained as > will become the commit description in git log. What this would mean > here is > to take the v1 changelog and adjust description to how v2 is > implemented, > and of course replace the v1 measurements with new ones. > > The "what changed since v1" can be summarized in the area after sign- > off and > "---", before the diffstat. This helps those that looked at v1 > previously, > but doesn't become part of git log. > > Now, my impression is that v1 made a sensible tradeoff for 4K pages, > as the > wastage was reduced, yet overal slab consumption didn't increase > much. But > for 64K the tradeoff looked rather bad. I think it's because with 64K > pages > and certain object size you can e.g. get less waste with order-3 than > order-2, but the difference will be relatively tiny part of the 64KB, > so > it's not worth the increase of order, while with 4KB you can get > larger > reduction of waste both in absolute amount and especially relatively > to the > 4KB size. > > So I think ideally the calculation would somehow take this into > account. The > changes done in v2 as described above are different. It seems as a > result we > can now calculate lower orders on 4K systems than before the patch, > probably > due to conditions 2) or 3) ? I think it would be best if the patch > resulted > only in the same or higher order. It should be enough to tweak some > thresholds for when it makes sense to pay the price of higher order - > whether the reduction of wastage is worth it, in a way that takes the > page > size into account. > > Thanks, > Vlastimil Hi Vlastimil, Indeed, I aim to optimize memory allocation in the SLUB allocator [1] by targeting larger page sizes with minimal modifications , resulting in reduced memory consumpion. [1]https://lore.kernel.org/linux-mm/20230720102337.2069722-1- jaypatel@xxxxxxxxxxxxx/ Thanks, Jay Patel > > > I have conducted tests on systems with 160 CPUs and 16 CPUs using > > 4K > > and 64K page sizes. The tests showed that the patch successfully > > reduces the total and wastage of slab memory without any noticeable > > performance degradation in the hackbench test. > > > > Test Results are as follows: > > 1) On 160 CPUs with 4K Page size > > > > +----------------+----------------+----------------+ > > > Total wastage in slub memory | > > +----------------+----------------+----------------+ > > > | After Boot | After Hackbench| > > > Normal | 2090 Kb | 3204 Kb | > > > With Patch | 1825 Kb | 3088 Kb | > > > Wastage reduce | ~12% | ~4% | > > +----------------+----------------+----------------+ > > > > +-----------------+----------------+----------------+ > > > Total slub memory | > > +-----------------+----------------+----------------+ > > > | After Boot | After Hackbench| > > > Normal | 500572 | 713568 | > > > With Patch | 482036 | 688312 | > > > Memory reduce | ~4% | ~3% | > > +-----------------+----------------+----------------+ > > > > hackbench-process-sockets > > +-------+-----+----------+----------+-----------+ > > > | Normal |With Patch| | > > +-------+-----+----------+----------+-----------+ > > > Amean | 1 | 1.3237 | 1.2737 | ( 3.78%) | > > > Amean | 4 | 1.5923 | 1.6023 | ( -0.63%) | > > > Amean | 7 | 2.3727 | 2.4260 | ( -2.25%) | > > > Amean | 12 | 3.9813 | 4.1290 | ( -3.71%) | > > > Amean | 21 | 6.9680 | 7.0630 | ( -1.36%) | > > > Amean | 30 | 10.1480 | 10.2170 | ( -0.68%) | > > > Amean | 48 | 16.7793 | 16.8780 | ( -0.59%) | > > > Amean | 79 | 28.9537 | 28.8187 | ( 0.47%) | > > > Amean | 110 | 39.5507 | 40.0157 | ( -1.18%) | > > > Amean | 141 | 51.5670 | 51.8200 | ( -0.49%) | > > > Amean | 172 | 62.8710 | 63.2540 | ( -0.61%) | > > > Amean | 203 | 74.6417 | 75.2520 | ( -0.82%) | > > > Amean | 234 | 86.0853 | 86.5653 | ( -0.56%) | > > > Amean | 265 | 97.9203 | 98.4617 | ( -0.55%) | > > > Amean | 296 | 108.6243 | 109.8770 | ( -1.15%) | > > +-------+-----+----------+----------+-----------+ > > > > 2) On 160 CPUs with 64K Page size > > +-----------------+----------------+----------------+ > > > Total wastage in slub memory | > > +-----------------+----------------+----------------+ > > > | After Boot |After Hackbench | > > > Normal | 919 Kb | 1880 Kb | > > > With Patch | 807 Kb | 1684 Kb | > > > Wastage reduce | ~12% | ~10% | > > +-----------------+----------------+----------------+ > > > > +-----------------+----------------+----------------+ > > > Total slub memory | > > +-----------------+----------------+----------------+ > > > | After Boot | After Hackbench| > > > Normal | 1862592 | 3023744 | > > > With Patch | 1644416 | 2675776 | > > > Memory reduce | ~12% | ~11% | > > +-----------------+----------------+----------------+ > > > > hackbench-process-sockets > > +-------+-----+----------+----------+-----------+ > > > | Normal |With Patch| | > > +-------+-----+----------+----------+-----------+ > > > Amean | 1 | 1.2547 | 1.2677 | ( -1.04%) | > > > Amean | 4 | 1.5523 | 1.5783 | ( -1.67%) | > > > Amean | 7 | 2.4157 | 2.3883 | ( 1.13%) | > > > Amean | 12 | 3.9807 | 3.9793 | ( 0.03%) | > > > Amean | 21 | 6.9687 | 6.9703 | ( -0.02%) | > > > Amean | 30 | 10.1403 | 10.1297 | ( 0.11%) | > > > Amean | 48 | 16.7477 | 16.6893 | ( 0.35%) | > > > Amean | 79 | 27.9510 | 28.0463 | ( -0.34%) | > > > Amean | 110 | 39.6833 | 39.5687 | ( 0.29%) | > > > Amean | 141 | 51.5673 | 51.4477 | ( 0.23%) | > > > Amean | 172 | 62.9643 | 63.1647 | ( -0.32%) | > > > Amean | 203 | 74.6220 | 73.7900 | ( 1.11%) | > > > Amean | 234 | 85.1783 | 85.3420 | ( -0.19%) | > > > Amean | 265 | 96.6627 | 96.7903 | ( -0.13%) | > > > Amean | 296 | 108.2543 | 108.2253 | ( 0.03%) | > > +-------+-----+----------+----------+-----------+ > > > > 3) On 16 CPUs with 4K Page size > > +-----------------+----------------+------------------+ > > > Total wastage in slub memory | > > +-----------------+----------------+------------------+ > > > | After Boot | After Hackbench | > > > Normal | 491 Kb | 727 Kb | > > > With Patch | 483 Kb | 670 Kb | > > > Wastage reduce | ~1% | ~8% | > > +-----------------+----------------+------------------+ > > > > +-----------------+----------------+----------------+ > > > Total slub memory | > > +-----------------+----------------+----------------+ > > > | After Boot | After Hackbench| > > > Normal | 105340 | 153116 | > > > With Patch | 103620 | 147412 | > > > Memory reduce | ~1.6% | ~4% | > > +-----------------+----------------+----------------+ > > > > hackbench-process-sockets > > +-------+-----+----------+----------+---------+ > > > | Normal |With Patch| | > > +-------+-----+----------+----------+---------+ > > > Amean | 1 | 1.0963 | 1.1070 | ( -0.97%) | > > > Amean | 4 | 3.7963) | 3.7957 | ( 0.02%) | > > > Amean | 7 | 6.5947) | 6.6017 | ( -0.11%) | > > > Amean | 12 | 11.1993) | 11.1730 | ( 0.24%) | > > > Amean | 21 | 19.4097) | 19.3647 | ( 0.23%) | > > > Amean | 30 | 27.7023) | 27.6040 | ( 0.35%) | > > > Amean | 48 | 44.1287) | 43.9630 | ( 0.38%) | > > > Amean | 64 | 58.8147) | 58.5753 | ( 0.41%) | > > +-------+----+---------+----------+-----------+ > > > > 4) On 16 CPUs with 64K Page size > > +----------------+----------------+----------------+ > > > Total wastage in slub memory | > > +----------------+----------------+----------------+ > > > | After Boot | After Hackbench| > > > Normal | 194 Kb | 349 Kb | > > > With Patch | 191 Kb | 344 Kb | > > > Wastage reduce | ~1% | ~1% | > > +----------------+----------------+----------------+ > > > > +-----------------+----------------+----------------+ > > > Total slub memory | > > +-----------------+----------------+----------------+ > > > | After Boot | After Hackbench| > > > Normal | 330304 | 472960 | > > > With Patch | 319808 | 458944 | > > > Memory reduce | ~3% | ~3% | > > +-----------------+----------------+----------------+ > > > > hackbench-process-sockets > > +-------+-----+----------+----------+---------+ > > > | Normal |With Patch| | > > +-------+----+----------+----------+----------+ > > > Amean | 1 | 1.9030 | 1.8967 | ( 0.33%) | > > > Amean | 4 | 7.2117 | 7.1283 | ( 1.16%) | > > > Amean | 7 | 12.5247 | 12.3460 | ( 1.43%) | > > > Amean | 12 | 21.7157 | 21.4753 | ( 1.11%) | > > > Amean | 21 | 38.2693 | 37.6670 | ( 1.57%) | > > > Amean | 30 | 54.5930 | 53.8657 | ( 1.33%) | > > > Amean | 48 | 87.6700 | 86.3690 | ( 1.48%) | > > > Amean | 64 | 117.1227 | 115.4893 | ( 1.39%) | > > +-------+----+----------+----------+----------+ > > > > Signed-off-by: Jay Patel <jaypatel@xxxxxxxxxxxxx> > > --- > > mm/slub.c | 52 +++++++++++++++++++++++++------------------------ > > --- > > 1 file changed, 25 insertions(+), 27 deletions(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index c87628cd8a9a..0a1090c528da 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -4058,7 +4058,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk); > > */ > > static unsigned int slub_min_order; > > static unsigned int slub_max_order = > > - IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER; > > + IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : 2; > > static unsigned int slub_min_objects; > > > > /* > > @@ -4087,11 +4087,10 @@ static unsigned int slub_min_objects; > > * the smallest order which will fit the object. > > */ > > static inline unsigned int calc_slab_order(unsigned int size, > > - unsigned int min_objects, unsigned int max_order, > > - unsigned int fract_leftover) > > + unsigned int min_objects, unsigned int max_order) > > { > > unsigned int min_order = slub_min_order; > > - unsigned int order; > > + unsigned int order, min_wastage = size, min_wastage_order = > > MAX_ORDER+1; > > > > if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE) > > return get_order(size * MAX_OBJS_PER_PAGE) - 1; > > @@ -4104,11 +4103,17 @@ static inline unsigned int > > calc_slab_order(unsigned int size, > > > > rem = slab_size % size; > > > > - if (rem <= slab_size / fract_leftover) > > - break; > > + if (rem < min_wastage) { > > + min_wastage = rem; > > + min_wastage_order = order; > > + } > > } > > > > - return order; > > + if (min_wastage_order <= slub_max_order) > > + return min_wastage_order; > > + else > > + return order; > > + > > } > > > > static inline int calculate_order(unsigned int size) > > @@ -4142,35 +4147,28 @@ static inline int calculate_order(unsigned > > int size) > > nr_cpus = nr_cpu_ids; > > min_objects = 4 * (fls(nr_cpus) + 1); > > } > > + > > + if ((min_objects * size) > (PAGE_SIZE << > > PAGE_ALLOC_COSTLY_ORDER)) > > + return PAGE_ALLOC_COSTLY_ORDER; > > + > > + if ((min_objects * size) <= PAGE_SIZE) > > + return slub_min_order; > > + > > max_objects = order_objects(slub_max_order, size); > > min_objects = min(min_objects, max_objects); > > > > - while (min_objects > 1) { > > - unsigned int fraction; > > - > > - fraction = 16; > > - while (fraction >= 4) { > > - order = calc_slab_order(size, min_objects, > > - slub_max_order, fraction); > > - if (order <= slub_max_order) > > - return order; > > - fraction /= 2; > > - } > > + while (min_objects >= 1) { > > + order = calc_slab_order(size, min_objects, > > + slub_max_order); > > + if (order <= slub_max_order) > > + return order; > > min_objects--; > > } > > > > - /* > > - * We were unable to place multiple objects in a slab. Now > > - * lets see if we can place a single object there. > > - */ > > - order = calc_slab_order(size, 1, slub_max_order, 1); > > - if (order <= slub_max_order) > > - return order; > > - > > /* > > * Doh this slab cannot be placed using slub_max_order. > > */ > > - order = calc_slab_order(size, 1, MAX_ORDER, 1); > > + order = calc_slab_order(size, 1, MAX_ORDER); > > if (order <= MAX_ORDER) > > return order; > > return -ENOSYS;