[RFC PATCH ] mm/slub: Reducing slub memory wastage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In the current implementation of the slub memory allocator, the slab
order selection process follows these criteria:

1) Determine the minimum order required to serve the minimum number of
objects (min_objects). This calculation is based on the formula (order =
min_objects * object_size / PAGE_SIZE).

2) If the minimum order is greater than the maximum allowed order
(slub_max_order), set slub_max_order as the order for this slab.

3) If the minimum order is less than the slub_max_order, iterate through
a loop from minimum order to slub_max_order and check if the condition
(rem <= slab_size / fract_leftover) holds true. Here, slab_size is
calculated as (PAGE_SIZE << order), rem is (slab_size % object_size),
and fract_leftover can have values of 16, 8, or 4. If the condition is
true, select that order for the slab.

However, in point 3, when calculating the fraction left over, it can
result in a large range of values (like 1 Kb to 256 bytes on 4K page
size & 4 Kb to 16 Kb on 64K page size with order 0 and goes on
increasing with higher order) when compared to the remainder (rem). This
can lead to the selection of an order that results in more memory
wastage. To mitigate such wastage, we have modified point 3 as follows:
instead of selecting the first order that satisfies the condition (rem
<= slab_size / fract_leftover), we iterate through the loop from
min_order to slub_max_order and choose the order that minimizes memory
wastage for the slab.

Let's consider an example using mm_struct on 160 CPUs so min_objects is
32, and let's assume a page size of 64K. The size of mm_struct is 1536
bytes, which means a single page can serve 42 objects, exceeding the
min_objects requirement.  With the current logic, order 0 is selected
for this slab since the remainder (rem) is 1 Kb (64 Kb % 1536 bytes),
which is less than 4 Kb (slab_size is 64 Kb/fraction_size is 16).
However, this results in wasting 1 Kb of memory for each mm_struct slab.
But with this patch, order 1 (2 pages) is chosen, leading to a wastage
of 512 bytes of memory for each mm_struct slab.  Consequently, reducing
memory wastage for this slab, increases the numbers of objects per slab.

I conducted tests on systems with 160 CPUs and 16 CPUs, using  4K and
64K page sizes. Through these tests, it was observed that the patch
successfully reduces the wastage of slab memory without any noticeable
performance degradation in the hackbench test report. However, it should
be noted that the patch also increases the total number of objects,
leading to an overall increase in total slab memory usage.

Test results are as follows:

1) On 160 CPUs with 4K Page size

+----------------+----------------+----------------+
|          Total wastage in slub memory            |
+----------------+----------------+----------------+
|                | After Boot     | After Hackbench|
| Normal         | 1819 Kb        | 3056 Kb        |
| With Patch     | 1288 Kb        | 2217 Kb        |
| Wastage reduce | ~29%           | ~27%           |
+----------------+----------------+----------------+

+-----------------+----------------+----------------+
|            Total slub memory                      |
+-----------------+----------------+----------------+
|                 | After Boot     | After Hackbench|
| Normal          | 469336         | 725960         |
| With Patch      | 488032         | 726416         |
| Memory increase | ~4%            | ~0.06%         |
+-----------------+----------------+----------------+

hackbench-process-sockets
+-------+-----+----------+----------+-----------+
|             |  Normal  |With Patch|           |
+-------+-----+----------+----------+-----------+
| Amean | 1   | 1.2887   | 1.2143   | ( 5.77%)  |
| Amean | 4   | 1.5633   | 1.5993   | ( -2.30%) |
| Amean | 7   | 2.3993   | 2.3813   | ( 0.75%)  |
| Amean | 12  | 3.9543   | 3.9637   | ( -0.24%) |
| Amean | 21  | 6.9723   | 6.9290   | ( 0.62%)  |
| Amean | 30  | 10.1407  | 10.1067  | ( 0.34%)  |
| Amean | 48  | 16.6730  | 16.6697  | ( 0.02%)  |
| Amean | 79  | 28.6743  | 28.8970  | ( -0.78%) |
| Amean | 110 | 39.0990  | 39.1857  | ( -0.22%) |
| Amean | 141 | 51.2667  | 51.2003  | ( 0.13%)  |
| Amean | 172 | 62.0797  | 62.3190  | ( -0.39%) |
| Amean | 203 | 73.5273  | 74.3567  | ( -1.13%) |
| Amean | 234 | 84.7130  | 85.7940  | ( -1.28%) |
| Amean | 265 | 97.0863  | 96.5810  | ( 0.52%)  |
| Amean | 296 | 108.4597 | 108.2987 | ( 0.15%)  |
+-------+-----+----------+----------+-----------+

2) On 160 CPUs with 64K Page size

+-----------------+----------------+----------------+
|          Total wastage in slub memory             |
+-----------------+----------------+----------------+
|                 | After Boot     |After Hackbench |
| Normal          | 729 Kb         | 1597 Kb        |
| With Patch      | 512 Kb         | 1066 Kb        |
| Wastage reduce  | ~30%           | ~33%           |
+-----------------+----------------+----------------+

+-----------------+----------------+----------------+
|            Total slub memory                      |
+-----------------+----------------+----------------+
|                 | After Boot     | After Hackbench|
| Normal          | 1612608        | 2667200        |
| With Patch      | 2147456        | 3500096        |
| Memory increase | ~33%           | ~31%           |
+-----------------+----------------+----------------+

hackbench-process-sockets
+-------+-----+----------+----------+-----------+
| Amean | 1   | 1.2667   | 1.2053   | ( 4.84%)  |
| Amean | 4   | 1.5997   | 1.6453   | ( -2.85%) |
| Amean | 7   | 2.3797   | 2.4017   | ( -0.92%) |
| Amean | 12  | 3.9763   | 3.9987   | ( -0.56%) |
| Amean | 21  | 6.9760   | 6.9917   | ( -0.22%) |
| Amean | 30  | 10.2150  | 10.2093  | ( 0.06%)  |
| Amean | 48  | 16.8080  | 16.7707  | ( 0.22%)  |
| Amean | 79  | 28.2237  | 28.1583  | ( 0.23%)  |
| Amean | 110 | 39.7710  | 39.8420  | ( -0.18%) |
| Amean | 141 | 51.3563  | 51.9233  | ( -1.10%) |
| Amean | 172 | 63.4027  | 63.7463  | ( -0.54%) |
| Amean | 203 | 74.4970  | 74.9327  | ( -0.58%) |
| Amean | 234 | 86.1483  | 85.9420  | ( 0.24%)  |
| Amean | 265 | 97.5137  | 97.6100  | ( -0.10%) |
| Amean | 296 | 109.2327 | 110.2417 | ( -0.92%) |
+-------+-----+----------+----------+-----------+

3) On 16 CPUs with 4K Page size

+-----------------+----------------+------------------+
|          Total wastage in slub memory               |
+-----------------+----------------+------------------+
|                 | After Boot     | After Hackbench  |
| Normal          | 666 Kb         | 902 Kb           |
| With Patch      | 533 Kb         | 694 Kb           |
| Wastage reduce  | ~20%           | ~23%             |
+-----------------+----------------+------------------+

+-----------------+----------------+----------------+
|            Total slub memory                      |
+-----------------+----------------+----------------+
|                 | After Boot     | After Hackbench|
| Normal          | 82360          | 122532         |
| With Patch      | 87372          | 129180         |
| Memory increase | ~6%            | ~5%            |
+-----------------+----------------+----------------+

hackbench-process-sockets
+-------+----+---------+---------+-----------+
| Amean | 1  | 1.4983  | 1.4867  | ( 0.78%)  |
| Amean | 4  | 5.6613  | 5.6793  | ( -0.32%) |
| Amean | 7  | 9.9813  | 9.9873  | ( -0.06%) |
| Amean | 12 | 17.6963 | 17.8527 | ( -0.88%) |
| Amean | 21 | 31.2017 | 31.2060 | ( -0.01%) |
| Amean | 30 | 44.0297 | 44.1750 | ( -0.33%) |
| Amean | 48 | 70.2073 | 69.6210 | ( 0.84%)  |
| Amean | 64 | 92.3257 | 93.7410 | ( -1.53%) |
+-------+----+---------+---------+-----------+

4) On 16 CPUs with 64K Page size

+----------------+----------------+----------------+
|          Total wastage in slub memory            |
+----------------+----------------+----------------+
|                | After Boot     | After Hackbench|
| Normal         | 239 Kb         | 484 Kb         |
| With Patch     | 135 Kb         | 234 Kb         |
| Wastage reduce | ~43%           | ~51%           |
+----------------+----------------+----------------+

+-----------------+----------------+----------------+
|            Total slub memory                      |
+-----------------+----------------+----------------+
|                 | After Boot     | After Hackbench|
| Normal          | 227136          | 328110        |
| With Patch      | 284352          | 451391        |
| Memory increase | ~25%            | ~37%          |
+-----------------+----------------+----------------+

hackbench-process-sockets
+-------+----+---------+---------+-----------+
| Amean | 1  | 1.3597  | 1.3583  | ( 0.10%)  |
| Amean | 4  | 5.2633  | 5.2503  | ( 0.25%)  |
| Amean | 7  | 9.2700  | 9.1710  | ( 1.07%)  |
| Amean | 12 | 16.3730 | 16.3103 | ( 0.38%)  |
| Amean | 21 | 28.7140 | 28.7510 | ( -0.13%) |
| Amean | 30 | 40.3987 | 40.4940 | ( -0.24%) |
| Amean | 48 | 63.8477 | 63.9457 | ( -0.15%) |
| Amean | 64 | 86.4917 | 85.3810 | ( 1.28%)  |
+-------+----+---------+---------+-----------+

Signed-off-by: Jay Patel <jaypatel@xxxxxxxxxxxxx>
---
 mm/slub.c | 43 +++++++++++++++++--------------------------
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c87628cd8a9a..e0b465173ed3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4087,11 +4087,10 @@ static unsigned int slub_min_objects;
  * the smallest order which will fit the object.
  */
 static inline unsigned int calc_slab_order(unsigned int size,
-		unsigned int min_objects, unsigned int max_order,
-		unsigned int fract_leftover)
+		unsigned int min_objects, unsigned int max_order)
 {
 	unsigned int min_order = slub_min_order;
-	unsigned int order;
+	unsigned int order, min_wastage = size, min_wastage_order = slub_max_order+1;
 
 	if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE)
 		return get_order(size * MAX_OBJS_PER_PAGE) - 1;
@@ -4104,11 +4103,17 @@ static inline unsigned int calc_slab_order(unsigned int size,
 
 		rem = slab_size % size;
 
-		if (rem <= slab_size / fract_leftover)
-			break;
+		if (rem < min_wastage) {
+			min_wastage = rem;
+			min_wastage_order = order;
+		}
 	}
 
-	return order;
+	if (min_wastage_order <= slub_max_order)
+		return min_wastage_order;
+	else
+		return order;
+
 }
 
 static inline int calculate_order(unsigned int size)
@@ -4145,32 +4150,18 @@ static inline int calculate_order(unsigned int size)
 	max_objects = order_objects(slub_max_order, size);
 	min_objects = min(min_objects, max_objects);
 
-	while (min_objects > 1) {
-		unsigned int fraction;
-
-		fraction = 16;
-		while (fraction >= 4) {
-			order = calc_slab_order(size, min_objects,
-					slub_max_order, fraction);
-			if (order <= slub_max_order)
-				return order;
-			fraction /= 2;
-		}
+	while (min_objects >= 1) {
+		order = calc_slab_order(size, min_objects,
+		slub_max_order);
+		if (order <= slub_max_order)
+			return order;
 		min_objects--;
 	}
 
-	/*
-	 * We were unable to place multiple objects in a slab. Now
-	 * lets see if we can place a single object there.
-	 */
-	order = calc_slab_order(size, 1, slub_max_order, 1);
-	if (order <= slub_max_order)
-		return order;
-
 	/*
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
-	order = calc_slab_order(size, 1, MAX_ORDER, 1);
+	order = calc_slab_order(size, 1, MAX_ORDER);
 	if (order <= MAX_ORDER)
 		return order;
 	return -ENOSYS;
-- 
2.31.1





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux