Re: [PATCH v3 05/13] mm, compaction: report compaction as contended only due to lock contention

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 11 Jul 2014 10:28:06 +0200

On 06/23/2014 03:39 AM, Minchan Kim wrote:
Hello Vlastimil,

On Fri, Jun 20, 2014 at 05:49:35PM +0200, Vlastimil Babka wrote:
Async compaction aborts when it detects zone lock contention or need_resched()
is true. David Rientjes has reported that in practice, most direct async
compactions for THP allocation abort due to need_resched(). This means that a
second direct compaction is never attempted, which might be OK for a page
fault, but khugepaged is intended to attempt a sync compaction in such case and
in these cases it won't.

This patch replaces "bool contended" in compact_control with an enum that
distinguieshes between aborting due to need_resched() and aborting due to lock
contention. This allows propagating the abort through all compaction functions
as before, but declaring the direct compaction as contended only when lock
contention has been detected.

A second problem is that try_to_compact_pages() did not act upon the reported
contention (both need_resched() or lock contention) and could proceed with
another zone from the zonelist. When need_resched() is true, that means
initializing another zone compaction, only to check again need_resched() in
isolate_migratepages() and aborting. For zone lock contention, the unintended
consequence is that the contended status reported back to the allocator
is decided from the last zone where compaction was attempted, which is rather
arbitrary.

This patch fixes the problem in the following way:
- need_resched() being true after async compaction returned from a zone means
   that further zones should not be tried. We do a cond_resched() so that we
   do not hog the CPU, and abort. "contended" is reported as false, since we
   did not fail due to lock contention.
- aborting zone compaction due to lock contention means we can still try
   another zone, since it has different locks. We report back "contended" as
   true only if *all* zones where compaction was attempted, it aborted due to
   lock contention.

As a result of these fixes, khugepaged will proceed with second sync compaction
as intended, when the preceding async compaction aborted due to need_resched().
Page fault compactions aborting due to need_resched() will spare some cycles
previously wasted by initializing another zone compaction only to abort again.
Lock contention will be reported only when compaction in all zones aborted due
to lock contention, and therefore it's not a good idea to try again after
reclaim.

Reported-by: David Rientjes <rientjes@xxxxxxxxxx>
Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Michal Nazarewicz <mina86@xxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
---
  mm/compaction.c | 48 +++++++++++++++++++++++++++++++++++++++---------
  mm/internal.h   | 15 +++++++++++----
  2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ebe30c9..e8cfac9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -180,9 +180,14 @@ static void update_pageblock_skip(struct compact_control *cc,
  }
  #endif /* CONFIG_COMPACTION */

-static inline bool should_release_lock(spinlock_t *lock)
+enum compact_contended should_release_lock(spinlock_t *lock)
  {
-	return need_resched() || spin_is_contended(lock);
+	if (spin_is_contended(lock))
+		return COMPACT_CONTENDED_LOCK;
+	else if (need_resched())
+		return COMPACT_CONTENDED_SCHED;
+	else
+		return COMPACT_CONTENDED_NONE;

If you want to raise priority of lock contention than need_resched
intentionally, please write it down on comment.

  }

  /*
@@ -197,7 +202,9 @@ static inline bool should_release_lock(spinlock_t *lock)
  static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
  				      bool locked, struct compact_control *cc)
  {
-	if (should_release_lock(lock)) {
+	enum compact_contended contended = should_release_lock(lock);
+
+	if (contended) {
  		if (locked) {
  			spin_unlock_irqrestore(lock, *flags);
  			locked = false;
@@ -205,7 +212,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,

  		/* async aborts if taking too long or contended */
  		if (cc->mode == MIGRATE_ASYNC) {
-			cc->contended = true;
+			cc->contended = contended;
  			return false;
  		}



@@ -231,7 +238,7 @@ static inline bool compact_should_abort(struct compact_control *cc)
  	/* async compaction aborts if contended */
  	if (need_resched()) {
  		if (cc->mode == MIGRATE_ASYNC) {
-			cc->contended = true;
+			cc->contended = COMPACT_CONTENDED_SCHED;
  			return true;
  		}

@@ -1101,7 +1108,8 @@ static unsigned long compact_zone_order(struct zone *zone, int order,
  	VM_BUG_ON(!list_empty(&cc.freepages));
  	VM_BUG_ON(!list_empty(&cc.migratepages));

-	*contended = cc.contended;
+	/* We only signal lock contention back to the allocator */
+	*contended = cc.contended == COMPACT_CONTENDED_LOCK;

Please write down *WHY* as well as your intention we can know by looking at code.

  	return ret;
  }

@@ -1132,6 +1140,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
  	struct zone *zone;
  	int rc = COMPACT_SKIPPED;
  	int alloc_flags = 0;
+	bool all_zones_contended = true;

  	/* Check if the GFP flags allow compaction */
  	if (!order || !may_enter_fs || !may_perform_io)
@@ -1146,6 +1155,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
  	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
  								nodemask) {
  		int status;
+		bool zone_contended;

  		if (compaction_deferred(zone, order))
  			continue;
@@ -1153,8 +1163,9 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
  		*deferred = false;

  		status = compact_zone_order(zone, order, gfp_mask, mode,
-						contended);
+							&zone_contended);
  		rc = max(status, rc);
+		all_zones_contended &= zone_contended;

  		/* If a normal allocation would succeed, stop compacting */
  		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0,
@@ -1168,12 +1179,31 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
  			 * succeeding after all, it will be reset.
  			 */
  			defer_compaction(zone, order);
+			/*
+			 * If we stopped compacting due to need_resched(), do
+			 * not try further zones and yield the CPU.
+			 */

For what? It would make your claim more clear.

+			if (need_resched()) {

compact_zone_order returns true state of contended only if it was lock contention
so it couldn't return true state of contended by need_resched so you made
need_resched check in here. It's fragile to me because it could be not a result
from ahead compact_zone_order call. More clear thing is compact_zone_order
should return zone_contended as enum, not bool and in here, you could check it.

It means you could return enum in compact_zone_order and make the result bool
in try_to_compact_pages.

+				/*
+				 * We might not have tried all the zones, so
+				 * be conservative and assume they are not
+				 * all lock contended.
+				 */
+				all_zones_contended = false;
+				cond_resched();
+				break;
+			}
  		}
  	}

-	/* If at least one zone wasn't deferred, we count a compaction stall */
-	if (!*deferred)
+	/*
+	 * If at least one zone wasn't deferred, we count a compaction stall
+	 * and we report if all zones that were tried were contended.
+	 */
+	if (!*deferred) {
  		count_compact_event(COMPACTSTALL);
+		*contended = all_zones_contended;

Why don't you initialize contended as *false* in function's intro?

+	}

  	return rc;
  }
diff --git a/mm/internal.h b/mm/internal.h
index a1b651b..2c187d2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -117,6 +117,13 @@ extern int user_min_free_kbytes;

  #if defined CONFIG_COMPACTION || defined CONFIG_CMA

+/* Used to signal whether compaction detected need_sched() or lock contention */
+enum compact_contended {
+	COMPACT_CONTENDED_NONE = 0, /* no contention detected */
+	COMPACT_CONTENDED_SCHED,    /* need_sched() was true */
+	COMPACT_CONTENDED_LOCK,     /* zone lock or lru_lock was contended */
+};
+
  /*
   * in mm/compaction.c
   */
@@ -144,10 +151,10 @@ struct compact_control {
  	int order;			/* order a direct compactor needs */
  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
  	struct zone *zone;
-	bool contended;			/* True if a lock was contended, or
-					 * need_resched() true during async
-					 * compaction
-					 */
+	enum compact_contended contended; /* Signal need_sched() or lock
+					   * contention detected during
+					   * compaction
+					   */
  };

  unsigned long
--

Anyway, most big concern is that you are changing current behavior as
I said earlier.

Well that behavior is pretty new, introduced recently by David's patches 
and now he himself is saying that it makes page fault THP allocations 
very unsuccessful and the need_resched() is not a good termination 
criterion. But OK, I can restrict this to khugepaged...

Old behavior in THP page fault when it consumes own timeslot was just
abort and fallback 4K page but with your patch, new behavior is
take a rest when it founds need_resched and goes to another round with
async, not sync compaction. I'm not sure we need another round with
async compaction at the cost of increasing latency rather than fallback
4 page.
>
It might be okay if the VMA has MADV_HUGEPAGE which is good hint to
indicate non-temporal VMA so latency would be trade-off but it's not
for temporal big memory allocation in HUGEPAGE_ALWAYS system.

Yeah that could be useful hint when we eventually move away from 
need_resched() to a fixed/tunable work per compaction attempt. But I'll 
rather not introduce it in the series.

If you really want to go this, could you show us numbers?

1. How many could we can be successful in direct compaction by this patch?

Doesn't seem to help. I'll drop it.

2. How long could we increase latency for temporal allocation
    for HUGEPAGE_ALWAYS system?

Don't have the data.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>