Re: isolate_freepages_block and excessive CPU usage by OSD process

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 28 Nov 2014 10:26:15 +0100

On 28.11.2014 9:03, Joonsoo Kim wrote:
On Tue, Nov 25, 2014 at 01:48:42AM +0400, Andrey Korolyov wrote:
On Sun, Nov 23, 2014 at 12:33 PM, Christian Marie <christian@xxxxxxxxx> wrote:
Here's an update:

Tried running 3.18.0-rc5 over the weekend to no avail. A load spike through
Ceph brings no perceived improvement over the chassis running 3.10 kernels.

Here is a graph of *system* cpu time (not user), note that 3.18 was a005.block:

http://ponies.io/raw/cluster.png

It is perhaps faring a little better that those chassis running the 3.10 in
that it did not have min_free_kbytes raised to 2GB as the others did, instead
it was sitting around 90MB.

The perf recording did look a little different. Not sure if this was just the
luck of the draw in how the fractal rendering works:

http://ponies.io/raw/perf-3.10.png

Any pointers on how we can track this down? There's at least three of us
following at this now so we should have plenty of area to test.

Checked against 3.16 (3.17 hanged for an unrelated problem), the issue
is presented for single- and two-headed systems as well. Ceph-users
reported presence of the problem for 3.17, so probably we are facing
generic compaction issue.

Hello,

I didn't follow-up this discussion, but, at glance, this excessive CPU
usage by compaction is related to following fixes.

Could you test following two patches?

If these fixes your problem, I will resumit patches with proper commit
description.

Thanks.

-------->8-------------
 From 079f3f119f1e3cbe9d981e7d0cada94e0c532162 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Date: Fri, 28 Nov 2014 16:36:00 +0900
Subject: [PATCH 1/2] mm/compaction: fix wrong order check in
  compact_finished()

What we want to check here is whether there is highorder freepage
in buddy list of other migratetype in order to steal it without
fragmentation. But, current code just checks cc->order which means
allocation request order. So, this is wrong.

Without this fix, non-movable synchronous compaction below pageblock order
would not stopped until compaction complete, because migratetype of most
pageblocks are movable and cc->order is always below than pageblock order
in this case.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
---
  mm/compaction.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b544d61..052194f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1082,7 +1082,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
  			return COMPACT_PARTIAL;
  
  		/* Job done if allocation would set block type */
-		if (cc->order >= pageblock_order && area->nr_free)
+		if (order >= pageblock_order && area->nr_free)
  			return COMPACT_PARTIAL;

Dang, good catch!
But I wonder, are MIGRATE_RESERVE pages counted towards area->nr_free?
Seems to me that they are, so this check can have false positives?
Hm probably for unmovable allocation, MIGRATE_CMA pages is the same case?

Vlastimil

  	}
  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>