On Fri, Nov 28, 2014 at 10:26:15AM +0100, Vlastimil Babka wrote: > On 28.11.2014 9:03, Joonsoo Kim wrote: > >On Tue, Nov 25, 2014 at 01:48:42AM +0400, Andrey Korolyov wrote: > >>On Sun, Nov 23, 2014 at 12:33 PM, Christian Marie <christian@xxxxxxxxx> wrote: > >>>Here's an update: > >>> > >>>Tried running 3.18.0-rc5 over the weekend to no avail. A load spike through > >>>Ceph brings no perceived improvement over the chassis running 3.10 kernels. > >>> > >>>Here is a graph of *system* cpu time (not user), note that 3.18 was a005.block: > >>> > >>>http://ponies.io/raw/cluster.png > >>> > >>>It is perhaps faring a little better that those chassis running the 3.10 in > >>>that it did not have min_free_kbytes raised to 2GB as the others did, instead > >>>it was sitting around 90MB. > >>> > >>>The perf recording did look a little different. Not sure if this was just the > >>>luck of the draw in how the fractal rendering works: > >>> > >>>http://ponies.io/raw/perf-3.10.png > >>> > >>>Any pointers on how we can track this down? There's at least three of us > >>>following at this now so we should have plenty of area to test. > >> > >>Checked against 3.16 (3.17 hanged for an unrelated problem), the issue > >>is presented for single- and two-headed systems as well. Ceph-users > >>reported presence of the problem for 3.17, so probably we are facing > >>generic compaction issue. > >> > >Hello, > > > >I didn't follow-up this discussion, but, at glance, this excessive CPU > >usage by compaction is related to following fixes. > > > >Could you test following two patches? > > > >If these fixes your problem, I will resumit patches with proper commit > >description. > > > >Thanks. > > > >-------->8------------- > > From 079f3f119f1e3cbe9d981e7d0cada94e0c532162 Mon Sep 17 00:00:00 2001 > >From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > >Date: Fri, 28 Nov 2014 16:36:00 +0900 > >Subject: [PATCH 1/2] mm/compaction: fix wrong order check in > > compact_finished() > > > >What we want to check here is whether there is highorder freepage > >in buddy list of other migratetype in order to steal it without > >fragmentation. But, current code just checks cc->order which means > >allocation request order. So, this is wrong. > > > >Without this fix, non-movable synchronous compaction below pageblock order > >would not stopped until compaction complete, because migratetype of most > >pageblocks are movable and cc->order is always below than pageblock order > >in this case. > > > >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > >--- > > mm/compaction.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > >diff --git a/mm/compaction.c b/mm/compaction.c > >index b544d61..052194f 100644 > >--- a/mm/compaction.c > >+++ b/mm/compaction.c > >@@ -1082,7 +1082,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc, > > return COMPACT_PARTIAL; > > /* Job done if allocation would set block type */ > >- if (cc->order >= pageblock_order && area->nr_free) > >+ if (order >= pageblock_order && area->nr_free) > > return COMPACT_PARTIAL; > > Dang, good catch! > But I wonder, are MIGRATE_RESERVE pages counted towards area->nr_free? > Seems to me that they are, so this check can have false positives? > Hm probably for unmovable allocation, MIGRATE_CMA pages is the same case? > Hello, Althoth MIGRATE_RESERVE are counted for area->nr_free, at this moment, there is no freepage on MIGRATE_RESERVE. It would be used already before triggering compaction. In case of MIGRATE_CMA, false positives are possible. But, it also broken on __zone_watermark_ok(). Without area->nr_free_cma, we can't fix inaccurate check. Please see following link. https://lkml.org/lkml/2014/6/2/1 Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>