Re: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/28/23 11:57, Jay Patel wrote:
> In the previous version [1], we were able to reduce slub memory
> wastage, but the total memory was also increasing so to solve
> this problem have modified the patch as follow:
> 
> 1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it
> will return with PAGE_ALLOC_COSTLY_ORDER.
> 2) Similarly, if min_objects * object_size < PAGE_SIZE, then it will
> return with slub_min_order.
> 3) Additionally, I changed slub_max_order to 2. There is no specific
> reason for using the value 2, but it provided the best results in
> terms of performance without any noticeable impact.
> 
> [1]
> https://lore.kernel.org/linux-mm/20230612085535.275206-1-jaypatel@xxxxxxxxxxxxx/

Hi,

thanks for the v2. A process note: the changelog should be self-contained as
will become the commit description in git log. What this would mean here is
to take the v1 changelog and adjust description to how v2 is implemented,
and of course replace the v1 measurements with new ones.

The "what changed since v1" can be summarized in the area after sign-off and
"---", before the diffstat. This helps those that looked at v1 previously,
but doesn't become part of git log.

Now, my impression is that v1 made a sensible tradeoff for 4K pages, as the
wastage was reduced, yet overal slab consumption didn't increase much. But
for 64K the tradeoff looked rather bad. I think it's because with 64K pages
and certain object size you can e.g. get less waste with order-3 than
order-2, but the difference will be relatively tiny part of the 64KB, so
it's not worth the increase of order, while with 4KB you can get larger
reduction of waste both in absolute amount and especially relatively to the
4KB size.

So I think ideally the calculation would somehow take this into account. The
changes done in v2 as described above are different. It seems as a result we
can now calculate lower orders on 4K systems than before the patch, probably
due to conditions 2) or 3) ? I think it would be best if the patch resulted
only in the same or higher order. It should be enough to tweak some
thresholds for when it makes sense to pay the price of higher order -
whether the reduction of wastage is worth it, in a way that takes the page
size into account.

Thanks,
Vlastimil

> I have conducted tests on systems with 160 CPUs and 16 CPUs using 4K
> and 64K page sizes. The tests showed that the patch successfully
> reduces the total and wastage of slab memory without any noticeable
> performance degradation in the hackbench test.
> 
> Test Results are as follows:
> 1) On 160 CPUs with 4K Page size
> 
> +----------------+----------------+----------------+
> |          Total wastage in slub memory            |
> +----------------+----------------+----------------+
> |                | After Boot     | After Hackbench|
> | Normal         | 2090 Kb        | 3204 Kb        |
> | With Patch     | 1825 Kb        | 3088 Kb        |
> | Wastage reduce | ~12%           | ~4%            |
> +----------------+----------------+----------------+
> 
> +-----------------+----------------+----------------+
> |            Total slub memory                      |
> +-----------------+----------------+----------------+
> |                 | After Boot     | After Hackbench|
> | Normal          | 500572         | 713568         |
> | With Patch      | 482036         | 688312         |
> | Memory reduce   | ~4%            | ~3%            |
> +-----------------+----------------+----------------+
> 
> hackbench-process-sockets
> +-------+-----+----------+----------+-----------+
> |             |  Normal  |With Patch|           |
> +-------+-----+----------+----------+-----------+
> | Amean |  1  |  1.3237  |  1.2737  | ( 3.78%)  |
> | Amean |   4 |   1.5923 |   1.6023 | ( -0.63%) |
> | Amean |   7 |   2.3727 |   2.4260 | ( -2.25%) |
> | Amean |  12 |   3.9813 |   4.1290 | ( -3.71%) |
> | Amean |  21 |   6.9680 |   7.0630 | ( -1.36%) |
> | Amean |  30 |  10.1480 |  10.2170 | ( -0.68%) |
> | Amean |  48 |  16.7793 |  16.8780 | ( -0.59%) |
> | Amean |  79 |  28.9537 |  28.8187 | ( 0.47%)  |
> | Amean | 110 |  39.5507 |  40.0157 | ( -1.18%) |
> | Amean | 141 |  51.5670 |  51.8200 | ( -0.49%) |
> | Amean | 172 |  62.8710 |  63.2540 | ( -0.61%) |
> | Amean | 203 |  74.6417 |  75.2520 | ( -0.82%) |
> | Amean | 234 |  86.0853 |  86.5653 | ( -0.56%) |
> | Amean | 265 |  97.9203 |  98.4617 | ( -0.55%) |
> | Amean | 296 | 108.6243 | 109.8770 | ( -1.15%) |
> +-------+-----+----------+----------+-----------+
> 
> 2) On 160 CPUs with 64K Page size
> +-----------------+----------------+----------------+
> |          Total wastage in slub memory             |
> +-----------------+----------------+----------------+
> |                 | After Boot     |After Hackbench |
> | Normal          | 919 Kb         | 1880 Kb        |
> | With Patch      | 807 Kb         | 1684 Kb        |
> | Wastage reduce  | ~12%           | ~10%           |
> +-----------------+----------------+----------------+
> 
> +-----------------+----------------+----------------+
> |            Total slub memory                      |
> +-----------------+----------------+----------------+
> |                 | After Boot     | After Hackbench|
> | Normal          | 1862592        | 3023744        |
> | With Patch      | 1644416        | 2675776        |
> | Memory reduce   | ~12%           | ~11%           |
> +-----------------+----------------+----------------+
> 
> hackbench-process-sockets
> +-------+-----+----------+----------+-----------+
> |             |  Normal  |With Patch|           |
> +-------+-----+----------+----------+-----------+
> | Amean |  1  |  1.2547  |  1.2677  | ( -1.04%) |
> | Amean |   4 |   1.5523 |   1.5783 | ( -1.67%) |
> | Amean |   7 |   2.4157 |   2.3883 | ( 1.13%)  |
> | Amean |  12 |   3.9807 |   3.9793 | ( 0.03%)  |
> | Amean |  21 |   6.9687 |   6.9703 | ( -0.02%) |
> | Amean |  30 |  10.1403 |  10.1297 | ( 0.11%)  |
> | Amean |  48 |  16.7477 |  16.6893 | ( 0.35%)  |
> | Amean |  79 |  27.9510 |  28.0463 | ( -0.34%) |
> | Amean | 110 |  39.6833 |  39.5687 | ( 0.29%)  |
> | Amean | 141 |  51.5673 |  51.4477 | ( 0.23%)  |
> | Amean | 172 |  62.9643 |  63.1647 | ( -0.32%) |
> | Amean | 203 |  74.6220 |  73.7900 | ( 1.11%)  |
> | Amean | 234 |  85.1783 |  85.3420 | ( -0.19%) |
> | Amean | 265 |  96.6627 |  96.7903 | ( -0.13%) |
> | Amean | 296 | 108.2543 | 108.2253 | ( 0.03%)  |
> +-------+-----+----------+----------+-----------+
> 
> 3) On 16 CPUs with 4K Page size
> +-----------------+----------------+------------------+
> |          Total wastage in slub memory               |
> +-----------------+----------------+------------------+
> |                 | After Boot     | After Hackbench  |
> | Normal          | 491 Kb         | 727 Kb           |
> | With Patch      | 483 Kb         | 670 Kb           |
> | Wastage reduce  | ~1%            | ~8%              |
> +-----------------+----------------+------------------+
> 
> +-----------------+----------------+----------------+
> |            Total slub memory                      |
> +-----------------+----------------+----------------+
> |                 | After Boot      | After Hackbench|
> | Normal          | 105340          |  153116        |
> | With Patch      | 103620          | 147412         |
> | Memory reduce   | ~1.6%           | ~4%            |
> +-----------------+----------------+----------------+
> 
> hackbench-process-sockets
> +-------+-----+----------+----------+---------+
> |             |  Normal  |With Patch|         |
> +-------+-----+----------+----------+---------+
> | Amean | 1  | 1.0963   | 1.1070  | ( -0.97%) |
> | Amean |  4 |  3.7963) |  3.7957 | ( 0.02%)  |
> | Amean |  7 |  6.5947) |  6.6017 | ( -0.11%) |
> | Amean | 12 | 11.1993) | 11.1730 | ( 0.24%)  |
> | Amean | 21 | 19.4097) | 19.3647 | ( 0.23%)  |
> | Amean | 30 | 27.7023) | 27.6040 | ( 0.35%)  |
> | Amean | 48 | 44.1287) | 43.9630 | ( 0.38%)  |
> | Amean | 64 | 58.8147) | 58.5753 | ( 0.41%)  |
> +-------+----+---------+----------+-----------+
> 
> 4) On 16 CPUs with 64K Page size
> +----------------+----------------+----------------+
> |          Total wastage in slub memory            |
> +----------------+----------------+----------------+
> |                | After Boot     | After Hackbench|
> | Normal         | 194 Kb         | 349 Kb         |
> | With Patch     | 191 Kb         | 344 Kb         |
> | Wastage reduce | ~1%            | ~1%            |
> +----------------+----------------+----------------+
> 
> +-----------------+----------------+----------------+
> |            Total slub memory                      |
> +-----------------+----------------+----------------+
> |                 | After Boot      | After Hackbench|
> | Normal          | 330304          | 472960        |
> | With Patch      | 319808          | 458944        |
> | Memory reduce   | ~3%             | ~3%           |
> +-----------------+----------------+----------------+
> 
> hackbench-process-sockets
> +-------+-----+----------+----------+---------+
> |             |  Normal  |With Patch|         |
> +-------+----+----------+----------+----------+
> | Amean | 1  |  1.9030  |  1.8967  | ( 0.33%) |
> | Amean |  4 |   7.2117 |   7.1283 | ( 1.16%) |
> | Amean |  7 |  12.5247 |  12.3460 | ( 1.43%) |
> | Amean | 12 |  21.7157 |  21.4753 | ( 1.11%) |
> | Amean | 21 |  38.2693 |  37.6670 | ( 1.57%) |
> | Amean | 30 |  54.5930 |  53.8657 | ( 1.33%) |
> | Amean | 48 |  87.6700 |  86.3690 | ( 1.48%) |
> | Amean | 64 | 117.1227 | 115.4893 | ( 1.39%) |
> +-------+----+----------+----------+----------+
> 
> Signed-off-by: Jay Patel <jaypatel@xxxxxxxxxxxxx>
> ---
>  mm/slub.c | 52 +++++++++++++++++++++++++---------------------------
>  1 file changed, 25 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index c87628cd8a9a..0a1090c528da 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4058,7 +4058,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk);
>   */
>  static unsigned int slub_min_order;
>  static unsigned int slub_max_order =
> -	IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER;
> +	IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : 2;
>  static unsigned int slub_min_objects;
>  
>  /*
> @@ -4087,11 +4087,10 @@ static unsigned int slub_min_objects;
>   * the smallest order which will fit the object.
>   */
>  static inline unsigned int calc_slab_order(unsigned int size,
> -		unsigned int min_objects, unsigned int max_order,
> -		unsigned int fract_leftover)
> +		unsigned int min_objects, unsigned int max_order)
>  {
>  	unsigned int min_order = slub_min_order;
> -	unsigned int order;
> +	unsigned int order, min_wastage = size, min_wastage_order = MAX_ORDER+1;
>  
>  	if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE)
>  		return get_order(size * MAX_OBJS_PER_PAGE) - 1;
> @@ -4104,11 +4103,17 @@ static inline unsigned int calc_slab_order(unsigned int size,
>  
>  		rem = slab_size % size;
>  
> -		if (rem <= slab_size / fract_leftover)
> -			break;
> +		if (rem < min_wastage) {
> +			min_wastage = rem;
> +			min_wastage_order = order;
> +		}
>  	}
>  
> -	return order;
> +	if (min_wastage_order <= slub_max_order)
> +		return min_wastage_order;
> +	else
> +		return order;
> +
>  }
>  
>  static inline int calculate_order(unsigned int size)
> @@ -4142,35 +4147,28 @@ static inline int calculate_order(unsigned int size)
>  			nr_cpus = nr_cpu_ids;
>  		min_objects = 4 * (fls(nr_cpus) + 1);
>  	}
> +
> +	if ((min_objects * size) > (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +		return PAGE_ALLOC_COSTLY_ORDER;
> +
> +	if ((min_objects * size) <= PAGE_SIZE)
> +		return slub_min_order;
> +
>  	max_objects = order_objects(slub_max_order, size);
>  	min_objects = min(min_objects, max_objects);
>  
> -	while (min_objects > 1) {
> -		unsigned int fraction;
> -
> -		fraction = 16;
> -		while (fraction >= 4) {
> -			order = calc_slab_order(size, min_objects,
> -					slub_max_order, fraction);
> -			if (order <= slub_max_order)
> -				return order;
> -			fraction /= 2;
> -		}
> +	while (min_objects >= 1) {
> +		order = calc_slab_order(size, min_objects,
> +		slub_max_order);
> +		if (order <= slub_max_order)
> +			return order;
>  		min_objects--;
>  	}
>  
> -	/*
> -	 * We were unable to place multiple objects in a slab. Now
> -	 * lets see if we can place a single object there.
> -	 */
> -	order = calc_slab_order(size, 1, slub_max_order, 1);
> -	if (order <= slub_max_order)
> -		return order;
> -
>  	/*
>  	 * Doh this slab cannot be placed using slub_max_order.
>  	 */
> -	order = calc_slab_order(size, 1, MAX_ORDER, 1);
> +	order = calc_slab_order(size, 1, MAX_ORDER);
>  	if (order <= MAX_ORDER)
>  		return order;
>  	return -ENOSYS;





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux