Re: [RFC PATCH 09/10] mm, compaction: try to capture the just-created high-order freepage

Vlastimil Babka <vbabka@xxxxxxx> · Thu, 12 Jun 2014 10:21:00 +0200

On 06/12/2014 04:20 AM, Minchan Kim wrote:
On Wed, Jun 11, 2014 at 04:56:49PM +0200, Vlastimil Babka wrote:
On 06/09/2014 11:26 AM, Vlastimil Babka wrote:
Compaction uses watermark checking to determine if it succeeded in creating
a high-order free page. My testing has shown that this is quite racy and it
can happen that watermark checking in compaction succeeds, and moments later
the watermark checking in page allocation fails, even though the number of
free pages has increased meanwhile.

It should be more reliable if direct compaction captured the high-order free
page as soon as it detects it, and pass it back to allocation. This would
also reduce the window for somebody else to allocate the free page.

This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a
suitable high-order page immediately when it is made available"), but later
reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable
high-order page") due to flaws.

This patch differs from the previous attempt in two aspects:

1) The previous patch scanned free lists to capture the page. In this patch,
     only the cc->order aligned block that the migration scanner just finished
     is considered, but only if pages were actually isolated for migration in
     that block. Tracking cc->order aligned blocks also has benefits for the
     following patch that skips blocks where non-migratable pages were found.


Generally I like this.

Thanks.

2) In this patch, the isolated free page is allocated through extending
     get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets
     all operations such as prep_new_page() and page->pfmemalloc setting that
     was missing in the previous attempt, zone statistics are updated etc.


But this part is problem.
Capturing is not common but you are adding more overhead in hotpath for rare cases
where even they are ok to fail so it's not a good deal.
In such case, We have no choice but to do things you mentioned (ex,statistics,
prep_new_page, pfmemalloc) manually in __alloc_pages_direct_compact.

OK, I will try.

Evaluation is pending.

Uh, so if anyone wants to test it, here's a fixed version, as initial evaluation
showed it does not actually capture anything (which should not affect patch 10/10
though) and debugging this took a while.

- for pageblock_order (i.e. THP), capture was never attempted, as the for cycle
   in isolate_migratepages_range() has ended right before the
   low_pfn == next_capture_pfn check
- lru_add_drain() has to be done before pcplists drain. This made a big difference
   (~50 successful captures -> ~1300 successful captures)
   Note that __alloc_pages_direct_compact() is missing lru_add_drain() as well, and
   all the existing watermark-based compaction termination decisions (which happen
   before the drain in __alloc_pages_direct_compact()) don't do any draining at all.

-----8<-----
From: Vlastimil Babka <vbabka@xxxxxxx>
Date: Wed, 28 May 2014 17:05:18 +0200
Subject: [PATCH fixed 09/10] mm, compaction: try to capture the just-created
  high-order freepage

Compaction uses watermark checking to determine if it succeeded in creating
a high-order free page. My testing has shown that this is quite racy and it
can happen that watermark checking in compaction succeeds, and moments later
the watermark checking in page allocation fails, even though the number of
free pages has increased meanwhile.

It should be more reliable if direct compaction captured the high-order free
page as soon as it detects it, and pass it back to allocation. This would
also reduce the window for somebody else to allocate the free page.

This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a
suitable high-order page immediately when it is made available"), but later
reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable
high-order page") due to flaws.

This patch differs from the previous attempt in two aspects:

1) The previous patch scanned free lists to capture the page. In this patch,
    only the cc->order aligned block that the migration scanner just finished
    is considered, but only if pages were actually isolated for migration in
    that block. Tracking cc->order aligned blocks also has benefits for the
    following patch that skips blocks where non-migratable pages were found.

2) In this patch, the isolated free page is allocated through extending
    get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets
    all operations such as prep_new_page() and page->pfmemalloc setting that
    was missing in the previous attempt, zone statistics are updated etc.

Evaluation is pending.

Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Michal Nazarewicz <mina86@xxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
---
  include/linux/compaction.h |   5 ++-
  mm/compaction.c            | 103 +++++++++++++++++++++++++++++++++++++++++++--
  mm/internal.h              |   2 +
  mm/page_alloc.c            |  69 ++++++++++++++++++++++++------
  4 files changed, 161 insertions(+), 18 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 01e3132..69579f5 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -10,6 +10,8 @@
  #define COMPACT_PARTIAL		2
  /* The full zone was compacted */
  #define COMPACT_COMPLETE	3
+/* Captured a high-order free page in direct compaction */
+#define COMPACT_CAPTURED	4

  #ifdef CONFIG_COMPACTION
  extern int sysctl_compact_memory;
@@ -22,7 +24,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
  extern int fragmentation_index(struct zone *zone, unsigned int order);
  extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
  			int order, gfp_t gfp_mask, nodemask_t *mask,
-			enum migrate_mode mode, bool *contended);
+			enum migrate_mode mode, bool *contended,
+			struct page **captured_page);
  extern void compact_pgdat(pg_data_t *pgdat, int order);
  extern void reset_isolation_suitable(pg_data_t *pgdat);
  extern unsigned long compaction_suitable(struct zone *zone, int order);
diff --git a/mm/compaction.c b/mm/compaction.c
index d1e30ba..2988758 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -541,6 +541,16 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
  	const isolate_mode_t mode = (cc->mode == MIGRATE_ASYNC ?
  					ISOLATE_ASYNC_MIGRATE : 0) |
  				    (unevictable ? ISOLATE_UNEVICTABLE : 0);
+	unsigned long capture_pfn = 0;   /* current candidate for capturing */
+	unsigned long next_capture_pfn = 0; /* next candidate for capturing */
+
+	if (cc->order > PAGE_ALLOC_COSTLY_ORDER
+		&& gfpflags_to_migratetype(cc->gfp_mask) == MIGRATE_MOVABLE
+			&& cc->order <= pageblock_order) {

You sent with RFC mark so I will not review detailed thing but just design stuff.

Why does capture work for limited high-order range?

I thought the overhead of maintaining the pfn's and trying the capture 
would be a bad tradeoff for low-order compactions which I suppose have a 
good chance of succeeding even without capture. But I admit I don't have 
data to support this yet.

Direct compaction is really costly operation for the process and he did it
at the cost of his resource(ie, timeslice) so anyone try to do direct compaction
deserves to have a precious result regardless of order.
Another question: Why couldn't the capture work for only MIGRATE_CMA?

CMA allocations don't go through standard direct compaction. They also 
use memory isolation to prevent parallel activity from stealing the 
pages freed by compaction. And importantly they set cc->order = -1, as 
the goal is not to compact a single high-order page, but arbitrary long 
range.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>