+ zsmalloc-merge-size_class-to-reduce-fragmentation.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Tue, 14 Oct 2014 13:35:27 -0700

The patch titled
     Subject: zsmalloc: merge size_class to reduce fragmentation
has been added to the -mm tree.  Its filename is
     zsmalloc-merge-size_class-to-reduce-fragmentation.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/zsmalloc-merge-size_class-to-reduce-fragmentation.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/zsmalloc-merge-size_class-to-reduce-fragmentation.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Subject: zsmalloc: merge size_class to reduce fragmentation

zsmalloc has many size_classes to reduce fragmentation and they are in 16
bytes unit, for example, 16, 32, 48, etc., if PAGE_SIZE is 4096.  And,
zsmalloc has constraint that each zspage has 4 pages at maximum.

In this situation, we can see interesting aspect.  Let's think about
size_class for 1488, 1472, ..., 1376.  To prevent external fragmentation,
they uses 4 pages per zspage and so all they can contain 11 objects at
maximum.

16384 (4096 * 4) = 1488 * 11 + remains
16384 (4096 * 4) = 1472 * 11 + remains
16384 (4096 * 4) = ...
16384 (4096 * 4) = 1376 * 11 + remains

It means that they have same characteristics and classification between
them isn't needed.  If we use one size_class for them, we can reduce
fragementation and save some memory since both the 1488 and 1472 sized
classes can only fit 11 objects into 4 pages, and an object that's 1472
bytes can fit into an object that's 1488 bytes, merging these classes to
always use objects that are 1488 bytes will reduce the total number of
size classes.  And reducing the total number of size classes reduces
overall fragmentation, because a wider range of compressed pages can fit
into a single size class, leaving less unused objects in each size class.

For this purpose, this patch implement size_class merging.  If there is
size_class that have same pages_per_zspage and same number of objects per
zspage with previous size_class, we don't create new size_class.  Instead,
we use previous, same characteristic size_class.  With this way, above
example sizes (1488, 1472, ..., 1376) use just one size_class so we can
get much more memory utilization.

Below is result of my simple test.

TEST ENV: EXT4 on zram, mount with discard option WORKLOAD: untar kernel
source code, remove directory in descending order in size.  (drivers arch
fs sound include net Documentation firmware kernel tools)

Each line represents orig_data_size, compr_data_size, mem_used_total,
fragmentation overhead (mem_used - compr_data_size) and overhead ratio
(overhead to compr_data_size), respectively, after untar and remove
operation is executed.

* untar-nomerge.out

orig_size compr_size used_size overhead overhead_ratio
525.88MB 199.16MB 210.23MB  11.08MB 5.56%
288.32MB  97.43MB 105.63MB   8.20MB 8.41%
177.32MB  61.12MB  69.40MB   8.28MB 13.55%
146.47MB  47.32MB  56.10MB   8.78MB 18.55%
124.16MB  38.85MB  48.41MB   9.55MB 24.58%
103.93MB  31.68MB  40.93MB   9.25MB 29.21%
 84.34MB  22.86MB  32.72MB   9.86MB 43.13%
 66.87MB  14.83MB  23.83MB   9.00MB 60.70%
 60.67MB  11.11MB  18.60MB   7.49MB 67.48%
 55.86MB   8.83MB  16.61MB   7.77MB 88.03%
 53.32MB   8.01MB  15.32MB   7.31MB 91.24%

* untar-merge.out

orig_size compr_size used_size overhead overhead_ratio
526.23MB 199.18MB 209.81MB  10.64MB 5.34%
288.68MB  97.45MB 104.08MB   6.63MB 6.80%
177.68MB  61.14MB  66.93MB   5.79MB 9.47%
146.83MB  47.34MB  52.79MB   5.45MB 11.51%
124.52MB  38.87MB  44.30MB   5.43MB 13.96%
104.29MB  31.70MB  36.83MB   5.13MB 16.19%
 84.70MB  22.88MB  27.92MB   5.04MB 22.04%
 67.11MB  14.83MB  19.26MB   4.43MB 29.86%
 60.82MB  11.10MB  14.90MB   3.79MB 34.17%
 55.90MB   8.82MB  12.61MB   3.79MB 42.97%
 53.32MB   8.01MB  11.73MB   3.73MB 46.53%

As you can see above result, merged one has better utilization (overhead
ratio, 5th column) and uses less memory (mem_used_total, 3rd column).

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Nitin Gupta <ngupta@xxxxxxxxxx>
Cc: Jerome Marchand <jmarchan@xxxxxxxxxx>
Cc: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
Cc: Dan Streetman <ddstreet@xxxxxxxx>
Cc: Luigi Semenzato <semenzato@xxxxxxxxxx>
Cc: <juno.choi@xxxxxxx>
Cc: "seungho1.park" <seungho1.park@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/zsmalloc.c |   80 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 66 insertions(+), 14 deletions(-)

diff -puN mm/zsmalloc.c~zsmalloc-merge-size_class-to-reduce-fragmentation mm/zsmalloc.c

--- a/mm/zsmalloc.c~zsmalloc-merge-size_class-to-reduce-fragmentation
+++ a/mm/zsmalloc.c
@@ -214,7 +214,7 @@ struct link_free {
 };
 
 struct zs_pool {
-	struct size_class size_class[ZS_SIZE_CLASSES];
+	struct size_class *size_class[ZS_SIZE_CLASSES];
 
 	gfp_t flags;	/* allocation flags used when growing pool */
 	atomic_long_t pages_allocated;
@@ -468,7 +468,7 @@ static enum fullness_group fix_fullness_
 	if (newfg == currfg)
 		goto out;
 
-	class = &pool->size_class[class_idx];
+	class = pool->size_class[class_idx];
 	remove_zspage(page, class, currfg);
 	insert_zspage(page, class, newfg);
 	set_zspage_mapping(page, class_idx, newfg);
@@ -925,6 +925,23 @@ fail:
 	return notifier_to_errno(ret);
 }
 
+static unsigned int get_maxobj_per_zspage(int size, int pages_per_zspage)
+{
+	return pages_per_zspage * PAGE_SIZE / size;
+}
+
+static bool can_merge(struct size_class *prev, int size, int pages_per_zspage)
+{
+	if (prev->pages_per_zspage != pages_per_zspage)
+		return false;
+
+	if (get_maxobj_per_zspage(prev->size, prev->pages_per_zspage)
+		!= get_maxobj_per_zspage(size, pages_per_zspage))
+		return false;
+
+	return true;
+}
+
 /**
  * zs_create_pool - Creates an allocation pool to work from.
  * @flags: allocation flags used to allocate pool metadata
@@ -945,25 +962,56 @@ struct zs_pool *zs_create_pool(gfp_t fla
 	if (!pool)
 		return NULL;
 
-	for (i = 0; i < ZS_SIZE_CLASSES; i++) {
+	/*
+	 * Iterate reversly, because, size of size_class that we want to use
+	 * for merging should be larger or equal to current size.
+	 */
+	for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) {
 		int size;
+		int pages_per_zspage;
 		struct size_class *class;
+		struct size_class *prev_class;
 
 		size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA;
 		if (size > ZS_MAX_ALLOC_SIZE)
 			size = ZS_MAX_ALLOC_SIZE;
+		pages_per_zspage = get_pages_per_zspage(size);
+
+		/*
+		 * size_class is used for normal zsmalloc operation such
+		 * as alloc/free for that size. Although it is natural that we
+		 * have one size_class for each size, there is a chance that we
+		 * can get more memory utilization if we use one size_class for
+		 * many different sizes whose size_class have same
+		 * characteristics. So, we makes size_class point to
+		 * previous size_class if possible.
+		 */
+		if (i < ZS_SIZE_CLASSES - 1) {
+			prev_class = pool->size_class[i + 1];
+			if (can_merge(prev_class, size, pages_per_zspage)) {
+				pool->size_class[i] = prev_class;
+				continue;
+			}
+		}
+
+		class = kzalloc(sizeof(struct size_class), GFP_KERNEL);
+		if (!class)
+			goto err;
 
-		class = &pool->size_class[i];
 		class->size = size;
 		class->index = i;
+		class->pages_per_zspage = pages_per_zspage;
 		spin_lock_init(&class->lock);
-		class->pages_per_zspage = get_pages_per_zspage(size);
-
+		pool->size_class[i] = class;
 	}
 
 	pool->flags = flags;
 
 	return pool;
+
+err:
+	zs_destroy_pool(pool);
+	return NULL;
 }
 EXPORT_SYMBOL_GPL(zs_create_pool);
 
@@ -973,7 +1021,13 @@ void zs_destroy_pool(struct zs_pool *poo
 
 	for (i = 0; i < ZS_SIZE_CLASSES; i++) {
 		int fg;
-		struct size_class *class = &pool->size_class[i];
+		struct size_class *class = pool->size_class[i];
+
+		if (!class)
+			continue;
+
+		if (class->index != i)
+			continue;
 
 		for (fg = 0; fg < _ZS_NR_FULLNESS_GROUPS; fg++) {
 			if (class->fullness_list[fg]) {
@@ -981,6 +1035,7 @@ void zs_destroy_pool(struct zs_pool *poo
 					class->size, fg);
 			}
 		}
+		kfree(class);
 	}
 	kfree(pool);
 }
@@ -999,7 +1054,6 @@ unsigned long zs_malloc(struct zs_pool *
 {
 	unsigned long obj;
 	struct link_free *link;
-	int class_idx;
 	struct size_class *class;
 
 	struct page *first_page, *m_page;
@@ -1008,9 +1062,7 @@ unsigned long zs_malloc(struct zs_pool *
 	if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
 		return 0;
 
-	class_idx = get_size_class_index(size);
-	class = &pool->size_class[class_idx];
-	BUG_ON(class_idx != class->index);
+	class = pool->size_class[get_size_class_index(size)];
 
 	spin_lock(&class->lock);
 	first_page = find_get_zspage(class);
@@ -1063,7 +1115,7 @@ void zs_free(struct zs_pool *pool, unsig
 	first_page = get_first_page(f_page);
 
 	get_zspage_mapping(first_page, &class_idx, &fullness);
-	class = &pool->size_class[class_idx];
+	class = pool->size_class[class_idx];
 	f_offset = obj_idx_to_offset(f_page, f_objidx, class->size);
 
 	spin_lock(&class->lock);
@@ -1124,7 +1176,7 @@ void *zs_map_object(struct zs_pool *pool
 
 	obj_handle_to_location(handle, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
-	class = &pool->size_class[class_idx];
+	class = pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
 	area = &get_cpu_var(zs_map_area);
@@ -1158,7 +1210,7 @@ void zs_unmap_object(struct zs_pool *poo
 
 	obj_handle_to_location(handle, &page, &obj_idx);
 	get_zspage_mapping(get_first_page(page), &class_idx, &fg);
-	class = &pool->size_class[class_idx];
+	class = pool->size_class[class_idx];
 	off = obj_idx_to_offset(page, obj_idx, class->size);
 
 	area = this_cpu_ptr(&zs_map_area);
_

Patches currently in -mm which might be from iamjoonsoo.kim@xxxxxxx are

origin.patch
zsmalloc-merge-size_class-to-reduce-fragmentation.patch
page-owners-correct-page-order-when-to-free-page.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html