+ mm-compaction-wrap-calculating-first-and-last-pfn-of-pageblock.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Thu, 31 Mar 2016 15:10:06 -0700

The patch titled
     Subject: mm, compaction: wrap calculating first and last pfn of pageblock
has been added to the -mm tree.  Its filename is
     mm-compaction-wrap-calculating-first-and-last-pfn-of-pageblock.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-compaction-wrap-calculating-first-and-last-pfn-of-pageblock.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-compaction-wrap-calculating-first-and-last-pfn-of-pageblock.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Vlastimil Babka <vbabka@xxxxxxx>
Subject: mm, compaction: wrap calculating first and last pfn of pageblock

The goal here is to reduce latency (and increase success) of direct async
compaction by making it focus more on the goal of creating a high-order
page, at some expense of thoroughness.

This is based on an older attempt [1] which I didn't finish as it seemed
that it increased longer-term fragmentation.  Now it seems it doesn't, and
we have kcompactd for that goal.  The main patch (3) makes migration
scanner skip whole order-aligned blocks as soon as isolation fails in
them, as it takes just one unmigrated page to prevent a high-order buddy
page from fully merging.

Patch 4 then attempts to reduce the excessive freepage scanning (such as
reported in [2]) by allocating migration targets directly from freelists. 
Here we just need to be sure that the free pages are not from the same
block as the migrated pages.  This is also limited to direct async
compaction and is not meant to replace the more thorough free scanner for
other scenarios.

[1] https://lkml.org/lkml/2014/7/16/988
[2] http://www.spinics.net/lists/linux-mm/msg97475.html

Testing was done using stress-highalloc from mmtests, configured for order-4
GFP_KERNEL allocations:

                              4.6-rc1               4.6-rc1               4.6-rc1
                               patch2                patch3                patch4
Success 1 Min         24.00 (  0.00%)       27.00 (-12.50%)       43.00 (-79.17%)
Success 1 Mean        30.20 (  0.00%)       31.60 ( -4.64%)       51.60 (-70.86%)
Success 1 Max         37.00 (  0.00%)       35.00 (  5.41%)       73.00 (-97.30%)
Success 2 Min         42.00 (  0.00%)       32.00 ( 23.81%)       73.00 (-73.81%)
Success 2 Mean        44.00 (  0.00%)       44.80 ( -1.82%)       78.00 (-77.27%)
Success 2 Max         48.00 (  0.00%)       52.00 ( -8.33%)       81.00 (-68.75%)
Success 3 Min         91.00 (  0.00%)       92.00 ( -1.10%)       88.00 (  3.30%)
Success 3 Mean        92.20 (  0.00%)       92.80 ( -0.65%)       91.00 (  1.30%)
Success 3 Max         94.00 (  0.00%)       93.00 (  1.06%)       94.00 (  0.00%)

While the eager skipping of unsuitable blocks from patch 3 didn't affect
success rates, direct freepage allocation did improve them.

             4.6-rc1     4.6-rc1     4.6-rc1
              patch2      patch3      patch4
User         2587.42     2566.53     2413.57
System        482.89      471.20      461.71
Elapsed      1395.68     1382.00     1392.87

Times are not so useful metric for this benchmark as main portion is the
interfering kernel builds, but results do hint at reduced system times.

                                   4.6-rc1     4.6-rc1     4.6-rc1
                                    patch2      patch3      patch4
Direct pages scanned                163614      159608      123385
Kswapd pages scanned               2070139     2078790     2081385
Kswapd pages reclaimed             2061707     2069757     2073723
Direct pages reclaimed              163354      159505      122304

Reduced direct reclaim was unintended, but could be explained by more
successful first attempt at (async) direct compaction, which is attempted
before the first reclaim attempt in __alloc_pages_slowpath().

Compaction stalls                    33052       39853       55091
Compaction success                   12121       19773       37875
Compaction failures                  20931       20079       17216

Compaction is indeed more successful, and thus less likely to get
deferred, so there are also more direct compaction stalls.  

Page migrate success               3781876     3326819     2790838
Page migrate failure                 45817       41774       38113
Compaction pages isolated          7868232     6941457     5025092
Compaction migrate scanned       168160492   127269354    87087993
Compaction migrate prescanned            0           0           0
Compaction free scanned         2522142582  2326342620   743205879
Compaction free direct alloc             0           0      920792
Compaction free dir. all. miss           0           0        5865
Compaction cost                       5252        4476        3602

Patch 2 reduces migration scanned pages by 25% thanks to the eager
skipping.  Patch 3 reduces free scanned pages by 70%.  The portion of
direct allocation misses to all direct allocations is less than 1% which
should be acceptable.  Interestingly, patch 3 also reduces migration
scanned pages by another 30% on top of patch 2.  The reason is not clear,
but we can rejoice nevertheless.


This patch (of 4):

Compaction code has accumulated numerous instances of manual calculations
of the first (inclusive) and last (exclusive) pfn of a pageblock (or a
smaller block of given order), given a pfn within the pageblock.  Wrap
these calculations by introducing pageblock_start_pfn(pfn) and
pageblock_end_pfn(pfn) macros.

Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/compaction.c |   33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff -puN mm/compaction.c~mm-compaction-wrap-calculating-first-and-last-pfn-of-pageblock mm/compaction.c

--- a/mm/compaction.c~mm-compaction-wrap-calculating-first-and-last-pfn-of-pageblock
+++ a/mm/compaction.c
@@ -42,6 +42,11 @@ static inline void count_compact_events(
 #define CREATE_TRACE_POINTS
 #include <trace/events/compaction.h>
 
+#define block_start_pfn(pfn, order)	round_down(pfn, 1UL << (order))
+#define block_end_pfn(pfn, order)	ALIGN((pfn) + 1, 1UL << (order))
+#define pageblock_start_pfn(pfn)	block_start_pfn(pfn, pageblock_order)
+#define pageblock_end_pfn(pfn)		block_end_pfn(pfn, pageblock_order)
+
 static unsigned long release_freepages(struct list_head *freelist)
 {
 	struct page *page, *next;
@@ -161,7 +166,7 @@ static void reset_cached_positions(struc
 	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
 	zone->compact_cached_free_pfn =
-			round_down(zone_end_pfn(zone) - 1, pageblock_nr_pages);
+				pageblock_start_pfn(zone_end_pfn(zone) - 1);
 }
 
 /*
@@ -519,10 +524,10 @@ isolate_freepages_range(struct compact_c
 	LIST_HEAD(freelist);
 
 	pfn = start_pfn;
-	block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	block_start_pfn = pageblock_start_pfn(pfn);
 	if (block_start_pfn < cc->zone->zone_start_pfn)
 		block_start_pfn = cc->zone->zone_start_pfn;
-	block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+	block_end_pfn = pageblock_end_pfn(pfn);
 
 	for (; pfn < end_pfn; pfn += isolated,
 				block_start_pfn = block_end_pfn,
@@ -538,8 +543,8 @@ isolate_freepages_range(struct compact_c
 		 * scanning range to right one.
 		 */
 		if (pfn >= block_end_pfn) {
-			block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
-			block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+			block_start_pfn = pageblock_start_pfn(pfn);
+			block_end_pfn = pageblock_end_pfn(pfn);
 			block_end_pfn = min(block_end_pfn, end_pfn);
 		}
 
@@ -839,10 +844,10 @@ isolate_migratepages_range(struct compac
 
 	/* Scan block by block. First and last block may be incomplete */
 	pfn = start_pfn;
-	block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	block_start_pfn = pageblock_start_pfn(pfn);
 	if (block_start_pfn < cc->zone->zone_start_pfn)
 		block_start_pfn = cc->zone->zone_start_pfn;
-	block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+	block_end_pfn = pageblock_end_pfn(pfn);
 
 	for (; pfn < end_pfn; pfn = block_end_pfn,
 				block_start_pfn = block_end_pfn,
@@ -937,10 +942,10 @@ static void isolate_freepages(struct com
 	 * is using.
 	 */
 	isolate_start_pfn = cc->free_pfn;
-	block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
+	block_start_pfn = pageblock_start_pfn(cc->free_pfn);
 	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
 						zone_end_pfn(zone));
-	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
+	low_pfn = pageblock_start_pfn(cc->migrate_pfn);
 
 	/*
 	 * Isolate free pages until enough are available to migrate the
@@ -1094,12 +1099,12 @@ static isolate_migrate_t isolate_migrate
 	 * initialized by compact_zone()
 	 */
 	low_pfn = cc->migrate_pfn;
-	block_start_pfn = cc->migrate_pfn & ~(pageblock_nr_pages - 1);
+	block_start_pfn = pageblock_start_pfn(low_pfn);
 	if (block_start_pfn < zone->zone_start_pfn)
 		block_start_pfn = zone->zone_start_pfn;
 
 	/* Only scan within a pageblock boundary */
-	block_end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
+	block_end_pfn = pageblock_end_pfn(low_pfn);
 
 	/*
 	 * Iterate over whole pageblocks until we find the first suitable.
@@ -1356,7 +1361,7 @@ static int compact_zone(struct zone *zon
 	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
 	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
-		cc->free_pfn = round_down(end_pfn - 1, pageblock_nr_pages);
+		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
 		zone->compact_cached_free_pfn = cc->free_pfn;
 	}
 	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
@@ -1424,7 +1429,7 @@ check_drain:
 		if (cc->order > 0 && cc->last_migrated_pfn) {
 			int cpu;
 			unsigned long current_block_start =
-				cc->migrate_pfn & ~((1UL << cc->order) - 1);
+				block_start_pfn(cc->migrate_pfn, cc->order);
 
 			if (cc->last_migrated_pfn < current_block_start) {
 				cpu = get_cpu();
@@ -1449,7 +1454,7 @@ out:
 		cc->nr_freepages = 0;
 		VM_BUG_ON(free_pfn == 0);
 		/* The cached pfn is always the first in a pageblock */
-		free_pfn &= ~(pageblock_nr_pages-1);
+		free_pfn = pageblock_start_pfn(free_pfn);
 		/*
 		 * Only go back, not forward. The cached pfn might have been
 		 * already reset to zone end in compact_finished()
_

Patches currently in -mm which might be from vbabka@xxxxxxx are

mm-compaction-wrap-calculating-first-and-last-pfn-of-pageblock.patch
mm-compaction-reduce-spurious-pcplist-drains.patch
mm-compaction-skip-blocks-where-isolation-fails-in-async-direct-compaction.patch
mm-compaction-direct-freepage-allocation-for-async-direct-compaction.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html