On Tue, Nov 25, 2014 at 01:48:42AM +0400, Andrey Korolyov wrote: > On Sun, Nov 23, 2014 at 12:33 PM, Christian Marie <christian@xxxxxxxxx> wrote: > > Here's an update: > > > > Tried running 3.18.0-rc5 over the weekend to no avail. A load spike through > > Ceph brings no perceived improvement over the chassis running 3.10 kernels. > > > > Here is a graph of *system* cpu time (not user), note that 3.18 was a005.block: > > > > http://ponies.io/raw/cluster.png > > > > It is perhaps faring a little better that those chassis running the 3.10 in > > that it did not have min_free_kbytes raised to 2GB as the others did, instead > > it was sitting around 90MB. > > > > The perf recording did look a little different. Not sure if this was just the > > luck of the draw in how the fractal rendering works: > > > > http://ponies.io/raw/perf-3.10.png > > > > Any pointers on how we can track this down? There's at least three of us > > following at this now so we should have plenty of area to test. > > > Checked against 3.16 (3.17 hanged for an unrelated problem), the issue > is presented for single- and two-headed systems as well. Ceph-users > reported presence of the problem for 3.17, so probably we are facing > generic compaction issue. > Hello, I didn't follow-up this discussion, but, at glance, this excessive CPU usage by compaction is related to following fixes. Could you test following two patches? If these fixes your problem, I will resumit patches with proper commit description. Thanks. -------->8------------- >From 079f3f119f1e3cbe9d981e7d0cada94e0c532162 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Date: Fri, 28 Nov 2014 16:36:00 +0900 Subject: [PATCH 1/2] mm/compaction: fix wrong order check in compact_finished() What we want to check here is whether there is highorder freepage in buddy list of other migratetype in order to steal it without fragmentation. But, current code just checks cc->order which means allocation request order. So, this is wrong. Without this fix, non-movable synchronous compaction below pageblock order would not stopped until compaction complete, because migratetype of most pageblocks are movable and cc->order is always below than pageblock order in this case. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> --- mm/compaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index b544d61..052194f 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1082,7 +1082,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc, return COMPACT_PARTIAL; /* Job done if allocation would set block type */ - if (cc->order >= pageblock_order && area->nr_free) + if (order >= pageblock_order && area->nr_free) return COMPACT_PARTIAL; } -- 1.7.9.5 -------->8------------- >From e3a5280747c4d0d12c67ad83f0f3dc5dce0ff11e Mon Sep 17 00:00:00 2001 From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Date: Fri, 28 Nov 2014 16:44:30 +0900 Subject: [PATCH 2/2] mm/page_alloc: don't do heavy compaction if we have a fallback method SLUB sometimes uses high order allocation for allocating the slab to reduce fragmentation. But, it has fallback method because high order allocation would be hard to succeed and it also have a big impact on the system performance. But, current allocation logic in page allocator cannot filter out that request properly and high order request from SLUB invokes synchronous compaction which is really heavy hammer. SLUB would work well without high order allocation so this patch filter out that request. At my quick grab, other allocation requests with these gfp flags also have fallback method, but, I don't know whether all of them have it or not. But, __GFP_NOWARN + __GFP_NORETRY checks looks reasonable to avoid heavy hammer. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> --- mm/page_alloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 10310ad..e719f79 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2829,6 +2829,14 @@ rebalance: goto rebalance; } else { /* + * Certain gfp_mask notifies that allocation requestor + * has proper fallback method, so we can stop the hard work. + * See mm/slub.c for example. + */ + if (gfp_mask & __GFP_NORETRY && gfp_mask & __GFP_NOWARN) + goto nopage; + + /* * High-order allocations do not necessarily loop after * direct reclaim and reclaim/compaction depends on compaction * being called after reclaim so call directly if necessary -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>